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FOREWORD 


The  Thirty-First  Conference  on  the  Design  of  Experiments  In  Army  Research  and 
Development  and  Testing  was  held  23-25  October  1985.  The  Arn\y  Mathematics 
Steering  Committee  (AMSC)  Is  the  sponsor  of  this  series  of  meetings,  and  Its 
subcommittee  on  Statistics  and  Probability  organizes  the  scientific  phase  of 
each  of  them.  Members  of  this  subcommittee  would  like  to  thank  Professor 
Bernard  Harris  for  extending  an  invitation  to  hold  this  conference  at  the 
Mathematics  Research  Center,  The  University  of  Wisconsin,  Madison,  Wisconsin. 
His  work,  as  chairperson  for  local  arrangements,  was  a  big  factor  In  the 
success  of  this  meeting. 


This  year  eighteen  contributed  papers  were  given  In  the  clinical  and  technical 
sessions.  Most  of  these  were  presented  by  Army  scientists.  The  titles  of  the 
sessions  give  some  Indication  of  the  statistical  areas  treated:  (1)  Final 
Series  and  Multivariate  Analysis,  (2)  Consistence  Analysis,  (3)  Experimental 
Design,  (4)  Statistical  Modeling,  (5)  Data  Analysis,  (6)  Reliability  and 
Quality  Control.  For  the  Invited  speaker  phase  of  the  conference,  the  Program 
Commltee  was  pleased  to  obtain  the  services  of  the  following  nationally  known 
scientists  to  talk  on  topics  of  current  Interest  to  Army  personnel: 


Speaker  and  Affiliation 

Professor  Jerome  Sacks 
University  of  Illinois  at 
Urbana-Champalgn 

Professor  Marlon  R.  Reynolds,  Jr. 
Virginia  Polytechnic  Institute 
and  State  University 

Dr.  Daryl  Preglbon 
Bell  Laboratories 

Dr.  Howard  Wainer 
Educational  Testing  Services 

Professor  Gourl  K,  Bhattackaryya 


Titles  of  Address 
Keynote  Address 

Approaches  to  Statistical 
Validation  of  Simulation  Models 

An  Expert  System  for  Data 
Analysis 

How  to  Display  Data  Badly 
Accelerated  Life  Tests 


Since  the  Army  analytic  community  is  becoming  ever  more  Involved  In  the  use  of 
expert  opinion  and  the  related  approaches  to  the  analysis  of  new  systems 
performance  measures,  It  seemed  an  ideal  time  to  have  a  special  session  to 
provide  the  audience  with  new  Insight  into  this  Important  area.  The  AMSC  Is 
indebted  to  Professor  Nazer  D.  Slngpurwalla  of  George  Washington  University 
for  organizing  and  chairing  this  feature  session  entitled,  "Using  Expert 
Opinions  and  Expert  Systems  In  Rellabllly  and  Maintainability".  We  note  below 
the  titles  of  the  addresses  given  by  the  four  speakers  In  this  Informative 
session. 


ill 


HUMAN  FACTORS  AFFECTING  SUBJECTIVE  JUDGMENTS 


Mary  A.  Meyer,  Energy  Technology  Group,  Los  Alamos  National  Laboratories 

SOURCES  AND  EFFECTS  OF  CORRELATION  OF  EXPERT  OPINIONS 

Jane  M.  Booker,  Statistics  Group,  Los  Alamos  National  Laboratories 

■s 

USE  OF  EXPERT  OPINION  IN  RELIABILITY  ASSESSMENT  OF  THE  M-l  ABRAMS  TANK 
Bobby  Bennett,  U.S.  Army  Material  Systems  Analysis  Agency 
A  MATHEMATICAL  THEORY  OF  TESTABILITY 
Alan  Currlt,  Systems  Product  Division,  IBM,  Rochester 

i 

Professor  Emanuel  Parzen,  Department  of  Statistics  at  Texas  AAM  University  was 
selected  by  the  AMSC  to  receive  the  Fifth  Wilks  Award  for  Contributions  to 
Statistical  Methodologies  In  Army  Research  Development  and  Testing.  He  richly 
deserves  this  honor  for  his  many  significant  contrlbutlosn  to  time  series 
modeling  and  analysis,  stochastic  processes,  statistical  theory  (Including  his 
seminal  paper  on  density  estlfifStlonh  and  his  recent  work  on  the  foundations 
and  generalized  meghodologles  in  data  analysis.  His  latest  work  will 
undoubtedly  have  a  very  pronounced  effect  on  the  ;heory  and  practice  of 
statistics  In  the  years  to  come. 

The  AMSC  has  requested  that  the  proceedings  of  the  1985  conference  be 
distributed  Army-wide  so  that  the  Information  conalned  therein  can  assist 
scientists  with  some  of  their  statistical  problems.  Finally,  committee 
members  would  like  to  thank  the  Program  Committee  for  all  the  work  It  did  in 
putting  together  this  scientific  meeting. 
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APPROACHES  TO  STATISTICAL  VALIDATION  OF  SIMULATION  MODELS 
Marion  R.  Reynolds,  Jr. 

Virginia  Polytechnic  Institute  and  State  University 
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ABSTRACT 

The  process  of  validating  a  stochastic  simulation  model  usually  involves 
the  comparison  of  data  generated  by  the  model  with  corresponding  data  from 
the  real  system,  One  method  of  making  this  comparison  is  to  test  the 
hypothesis  that  the  distribution  of  model  output  is  the  same  as  the 
distribution  of  the  corresponding  variable  in  the  real  system.  Since  no 
model  is  a  perfect  reflection  of  the  real  system,  a  more  realistic 
formulation  is  to  test  the  hypothesis  that  the  model  is  close  enough  for  the 
purposes  of  the  model  user.  An  alternate  approach  to  validation  considers 
the  error  that  results  when  the  model  is  used  to  predict  the  behavior  of  the 
real  system.  In  order  to  help  the  model  user  evaluate  the  predictive  ability 
of  the  model,  confidence  intervals  for  expected  error  or  prediction  intervals 
for  actual  error  can  be  constructed. 
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1 .  SIMULATION  tO DELS 


Stochastic  simulation  models  are  now  widely  used  in  many  fields  to  model 
complex  systems  when  other  types  of  models  can  not  be  used.  In  many  cases 
the  system  being  modeled  will  include  many  simpler  processes  interacting  in  a 
dynamic  setting  so  that  it  is  not  possible  to  carry  through  a  direct 
mathematical  analysis.  The  nature  of  a  simulation  model  usually  means  that 
the  basic  assumptions  and  structure  of  the  model  are  not  readily  apparent  to 
the  model  user  so  that  model  validation  is  particularly  important  for  these 
models. 

Models  can  be  constructed  for  several  purposes,  for  example  to  gain  basic 
understanding  of  the  system  being  modeled,  to  compare  different  management 
strategies  with  the  idea  of  selecting  a  good  strategy,  or  to  predict  the 
behavior  of  the  system  being  modeled.  In  each  of  these  cases  some  inference 
obtained  using  the  model  will  be  applied  to  the  real  system.  In  most 
situations  the  ability  of  the  model  to  predict  system  behavior  will  be 
critical  to  the  effectives  of  the  model.  The  main  purpose  of  the  model  will 
usually  determine  the  predictive  ability  required  of  the  model  and  this  in 
turn  will  influence  the  approach  to  validation  that  is  required. 

2 .  VALIDATION 

Before  a  simulation  model  can  be  used  with  confidence,  the  model  user 
needs  to  know  whether  the  model  is  a  reasonable  representation  of  the  real 
system  so  that  inferences  or  predictions  obtained  from  the  model  are  useful 
for  the  real  system.  It  is  the  need  for  this  type  of  information  that  leads 
to  issues  of  validation  and  assessment  of  the  model. 

In  discussing  model  validation  it  is  usually  not  helpfull  to  think  in 
absolute  terms  of  a  model  being  either  valid  or  invalid,  but  rather  in  terms 
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of  degree  of  validity  or,  better  yet,  in  terms  of  degree  of  usefulness.  The 
usefulness  of  a  model  will  depend  on  the  purpose  of  the  model  and  on  the 
conditions  under  which  it  is  used.  For  example,  a  model  may  be  useful  for 
determining  the  relative  performance  of  two  management  strategies  but  not 
very  useful  for  providing  accurate  and  detailed  predictions  of  future  system 
behavior.  A  model  which  is  useful  for  providing  predictions  for  5  years  in 
the  future  may  not  provide  useful  predictions  for  15  years  in  the  future. 

A  useful  way  to  think  about  the  nature  of  validation  has  been  given  by 
Van  Horn  (1971).  He  defined  validation  as  "the  process  of  building  an 
acceptable  level  of  confidence  that  an  inference  about  a  simulated  process  is 
a  correct  or  valid  inference  for  the  actual  process”.  An  important  point 
here  is  that  validation  is  a  process  and  not  a  one  time  exercise. 

Ideally, the  validation  process  should  be  carried  out  during  the  model 
building  process  (Sargent  (1979))  as  well  as  after  the  model  is  essentially 
complete.  Another  important  point  in  Van  Horn's  definition  is  that 
validation  is  a  process  of  building  confidence  in  the  model  and  not 
necessarily  the  process  of  "proving"  that  the  model  is  valid. 

It  may  be  helpful  to  make  a  distinction  between  validation  and  what 
Fishman  and  Kiviat  (1968)  have  called  verification.  Verification  is  the 
process  of  determining  whether  the  simulation  model  behaves  as  the  model 
builders  intended.  For  example,  "debugging”  the  computer  program  is  an 
important  part  of  the  verification  process.  The  validation  process  extends 
beyond  the  verification  process  since  a  model  which  behaves  exactly  as  the 
model  builders  intended  still  may  not  be  useful  for  drawing  inferences  about 
the  real  system. 
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3.  APPROACHES  TO  VALIDATION 


Some  of  the  discussion  of  validation  in  the  simulation  literature  has 
focused  on  philosophical  issues.  Discussion  of  some  of  the  issues  involved 
are  given  in  McKenney  (1967),  Naylor  and  Finger  (1967),  Schrank  and  Holt 
(1967),  and  Shannon  (1975).  Bale!  and  Sargent  (1984)  give  an  up-to-date 
bibliography  of  papers  dealing  with  various  aspects  of  model  validation. 

One  direct  approach  to  validation  involves  examining  the  model  for  "face 
validity",  that  is,  determining  whether  the  assumptions  and  structure  of  the 
model  seem  reasonable  to  people  who  are  knowledgeable  about  the  real  system 
(see,  for  example,  Law  (1982)).  This  examination  of  assumptions  should,  of 
course,  be  carried  out  during  the  modeling  process  as  the  modeler  develops  a 
conceptual  model  in  collaboration  with  people  who  are  familiar  with  the 
system.  After  the  model  has  been  constructed  other  "independent"  experts  can 
be  used  to  evaluate  the  model. 

In  addition  to  examining  assumptions  for  conformance  to  existing 
knowledge  and  theory,  empirical  testing  of  these  assumption  can  be  carried 
out  (Naylor  and  Finger  (1967)).  In  this  context  the  use  of  sensitivity 
analysis  may  help  to  identify  which  assumptions  are  most  critical  so  that 
attention  can  be  focused  on  these  critical  assumptions  (Van  Horn  (1972)).  In 
addition  to  a  sensitivity  analysis  conducted  in  the  likely  range  of  model 
parameters ,  an  evaluation  of  model  performance  can  be  done  at  the  extremes  of 
the  parameter  values  (Sargent  (1983)). 

One  of  the  most  important  tests  to  which  a  model  can  be  subjected  in  the 
validation  process  is  the  comparison  of  data  obtained  from  the  real  system 
with  corresonding  data  generated  from  the  model,  if  there  is  close 
agreement,  in  some  sense,  between  these  two  data  sets  then  this  will  increase 
confidence  in  the  model.  Some  authors  argue  that  the  ability  of  the  model  to 


4 


predict  the  behavior  of  the  real  system  is  the  most  important  test  of  a 
model. 

Confidence  in  the  model  will  be  higher  when  the  data  used  in  the 
validation  of  the  model  is  independent  of  the  data  used  in  constructing  the 
model.  If  it  is  not  possible  to  obtain  separate  data  for  validation  then  one 
approach  is  to  split  the  existing  data  into  two  sets.  One  set  can  be  used 
for  constructing  the  model  and  the  other  set  can  be  used  for  validating  the 
model.  In  many  cases  the  data  used  in  constructing  and  validating  a  model 
will  be  historical  data  that  has  been  collected  on  the  existing  system  or  a 
similar  system.  Ideally  the  model  should  be  tested  by  its  ability  to  predict 
the  behavior  of  the  system  in  the  future.  This  may  not  be  immediately 
possible  either  because  the  real  system  may  not  yet  exist  or  because  there  is 
not  enough  time  to  wait  for  future  observations  on  the  real  system.  This 
paper  will  concentrate  on  the  case  where  validation  data  is  available  since 
this  is  the  case  where  statistical  approaches  can  be  used  in  comparing  the 
model  and  the  real  system. 

4.  EXAMPLE 

When  discussing  various  statistical  techniques  that  are  useful  in 
validation  it  may  be  helpful  to  think  in  terms  of  a  specific  type  of 
simulation  model  as  an  example.  Consider  the  model  PTAEDA  developed  by 
Daniels  and  Burkhart  (1975)  for  simulating  the  growth  of  trees  in  forest 
stands.  This  type  of  model  is  designed  to  model  stand  growth  over  time  so 
that  various  management  strategies  or  the  effects  of  various  natural 
phenomena  can  be  evaluated.  The  volume  of  wood  in  a  stand  at  some  future 
time  is  one  of  the  main  system  variables  of  interest,  but  other  variables 
such  as  the  number  of  trees  in  various  diameter  classes  may  also  be  of 
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interest.  In  this  model  individual  trees  within  the  stand  are  assigned 
initial  coordinate  locations  and  sizes  at  an  age  corresponding  to  the  onset 
of  competition.  Then  annual  diameter  and  height  growth  of  each  tree  is 
simulated  as  a  function  of  tree  size,  site  quality,  age,  and  an  index 
reflecting  competition  from  neighboring  trees.  Tree  growth  is  adjusted  by  a 
random  component  representing  genetic  and/or  microsite  variability.  Each 
year  each  tree  survives  with  a  certain  probability  and  this  survival 
probability  is  a  function  of  tree  size  and  competition.  The  wood  volumes 
for  individual  trees  at  the  end  of  the  simulation  period  are  obtained  by 
substituting  diameter  and  height  values  into  tree  volume  equations. 

Estimates  of  wood  yield  per  unit  area  are  obtained  by  summing  the  individual 
tree  volumes  and  multiplying  by  an  appropriate  expansion  factor. 

5 .  NOTATION 

Suppose  that  the  simulation  model  is  constructed  in  such  a  way  that  p 
input  variables  represented  by  X  =  (XltX2, . . .  are  used  to  generate  an 
output  variable  represented  by  Z.  The  input  variables  are  usually  selected 
to  correspond  to  the  most  important  observable  input  variables  in  the  real 
system.  The  output  variable  Z  in  the  model  corresponds  to  some  variable  Y 
that  is  of  interest  in  the  real  system.  For  example,  For  a  forest  stand 
simulator  designed  to  predict  stand  volume  at  a  future  time,  X  might 
represent  input  variables  such  as  site  quality,  stand  age  at  the  future  time, 
and  some  measure  of  current  density,  z  would  correspond  to  simulated  stand 
volume  from  the  model  and  Y  would  correspond  to  the  actual  stand  volume  at 
the  future  time.  In  most  applications  it  will  be  reasonable  to  treat  both  Y 
and  Z  as  random  variables  whose  distributions  depend  on  the  levels  of  X.  Y 
is  a  random  variable  because  the  value  of  Y  can  not  be  determined  by 
determining  the  values  of  a  finite  number  of  input  variables  and  z  is  of 
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course  a  random  variable  because  the  model  contains  stochastic  elements. 


Since  the  distributions  of  Y  and  Z  depend  on  X  it  will  be  convenient  to  work 
with  F(yix)  and  G(z|x),  the  conditional  distribution  functions  of  Y  and  Z, 
respectively. 

Model  users  will  usually  be  interested  in  using  a  model  to  make  two 
general  types  of  inferences  about  the  real  system  being  modeled.  The  first 
type  of  inference  is  concerend  with  a  parameter  or  characteristic  associated 
with  the  distribution  of  the  variable  Y  from  the  real  system.  The  parameter 
that  is  usually  of  most  interest  is  the  conditional  mean  E(Y|x);  other 
parameters  that  might  be  of  interest  are  P(Y  £  ylx),  the  probability  that  the 
system  output  is  below  a  specified  value  y,  and  the  variance  Var(Ylx).  All 
of  these  parameters  are  functions  of  the  input  variables  X.  For  example  a 
model  user  might  be  interested  in  estimating  the  average  volume  for  stands  of 
a  particular  type  where  the  type  of  stand  is  determined  by  specifying  the 
input  variables  age,  site  quality,  and  density.  Alternately,  the  user  might 
want  to  estimate  the  probability  that  a  stand  of  a  particular  type  has  a 
volume  below  an  economically  determined  lower  threshold. 

The  second  type  of  inference  is  concerned  with  predicting  an  actual 
value  of  Y  that  is  to  be  observed  when  X  is  at  some  specified  value.  For 
example  the  model  user  might  be  interested  in  a  particular  stand  and  want  to 
predict  the  volume  on  this  stand  (as  opposed  to  the  average  volume  on  all 
stands  of  this  type).  The  usefulness  of  the  model  for  making  either  type  of 
inference  depends  on  how  close  the  conditional  distribution  of  Z,  given  X  = 
x,  is  to  the  conditional  distribution  of  Y,  given  X  -  x.  The  best  that  could 
be  hoped  for  is  that  these  two  conditional  distributions  are  equal.  Even 
then,  in  any  trial  of  the  model ,  the  simulated  value  of  Z  will  not 
necessarily  be  close  to  the  corresponding  observed  value  of  Y  since  both  Z 
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and  Y  are  random  variables. 


Suppose  that  observations  from  the  real  system  are  available  for  n 
different  sets  of  conditions,  and  for  the  ith  set  of  conditions 
observations  from  the  real  system  are  available.  Let 

Yij  =  jth  observation  from  the  real  system  under  the  ith 
set  of  conditions 


and 


Yi  -  (Yn.Yiz . Yim.)  . 

For  example,  data  on  total  wood  volume  may  be  available  for  n  different 
types  of  plots.  In  this  example  each  plot  may  be  distinct  so  that  m^  =  1  for 
all  i.  Also  let 

?i  =  (  Xil , . . . , Xlp ) 

=  input  variables  for  the  ith  set  of  conditions. 

Corresponding  to  the  ith  set  of  conditions  represented  by  X^  =  x-^ ,  the 
simulation  model  can  be  run  m|  times  to  generate  m^  independent  simulated 
values  which  can  be  represented  by 

2i  -  (zil*zi2 >  * • • »zim^  )  » 

In  some  cases  it  may  be  useful  to  use  the  components  of  ya  and  Z±  indivi¬ 
dually,  but  it  other  cases  the  averages  may  be  used.  Then 

mi 

Yi  =  =  Yij/mi 

J=1 

is  an  estimator  of  E(Y|xi ),  the  mean  of  the  system  at  the  ith  set  of 
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conditions,  and 


zi  “  zij/mi 

is  an  estimator  of  E(Z|Xj),  the  mean  of  the  model  at  the  ith  set  of  condi¬ 
tions.  The  bias  or  expected  error  in  the  model  at  --  x*  is  E(  Y-Z|x^ ) 
and  an  unbiased  estimator  of  this  bias  is 

Di  =  D(xi)  =  Yt  -  ^  . 

It  may  also  be  useful  to  think  of  Zi  as  a  predictor  of  Yi  before  Yi  is 
observed  and  in  this  case  Di  is  the  prediction  error. 

6 .  HYPOTHESIS  TESTING 

In  developing  a  model  based  on  a  finite  number  of  input  variables  X,  the 
best  model  that  could  be  achieved  would  have  the  conditional  distribution  of 
Z  given  X  =  x  equal  to  the  conditional  distribution  of  Y  given  X  =  x.  Thus 
a  natural  way  to  formulate  the  validation  problem  is  as  the  problem  of 
testing  the  null  hypothesis  that  Z  and  Y  have  the  same  conditional  distribu¬ 
tions.  Let  A  be  a  set  representing  the  range  of  input  variables  for  which 
it  is  desirable  to  validate  the  model.  Then  the  problem  can  be  stated 
formally  as  one  of  testing 

Hqs  F< • I x )  =  G( • |x)  for  all  x  6  A  . 

The  alternative  is  that  F  and  G  are  not  equal  for  at  least  one  x  e  A. 

Ideally  the  set  of  validation  data  should  be  representative  of  A  ir*  some  way, 
for  example,  a  random  sample  from  A.  In  practice  it  may  not  be  feasible  to 
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take  a  random  sample  and  thus  whatever  data  is  available  may  have  to  be  used. 
For  purposes  of  building  confidence  in  the  model,  data  that  represents  the 
extremes  of  A  might  actually  be  better  than  a  random  sample.  If  the  valida¬ 
tion  data  does  not  adequately  cover  A  then  of  course  the  conclusions  about 
model  validity  that  can  be  drawn  from  the  data  would  be  restricted  to  the 
subset  of  A  represented  by  the  data. 

A  reasonable  interpretation  of  the  hypothesis  testing  formulation  of 
the  validation  problem  is  that  the  test  is  being  carried  out  to  determine 
whether  there  is  any  indication  that  the  model  does  not  represent  the  real 
system.  If  the  null  hypothesis  is  not  rejected  then  this  is  interpreted  to 
mean  that  there  is  no  strong  evidence  of  model  inadequacy.  It  does  not 
of  course  mean  that  the  model  is  a  perfect  reflection  of  the  real  system 
of  that  the  model  can  not  be  improved  upon  since  the  power  of  the  test  used 
may  not  be  high.  On  the  other  hand  a  decision  to  reject  the  null  hypothesis 
does  not  necessarily  mean  that  the  model  is  not  useful.  Rejection  in  this 
case  would  be  taken  as  an  indication  that  there  is  room  for  improvement  and 
that  the  data  should  be  examined  for  indications  of  areas  for  model  improve¬ 
ment. 

In  some  cases  the  requirement  that  F  and  G  be  equal  may  be  too  strict 
and  a  test  for  equal  conditional  means  may  be  sufficient.  In  this  case 
the  null  hypothesis  would  be 

Hq:  E(Y|x)  =  E(Z|x)  for  all  x  e  A. 

If  m  and  m'  are  small  there  may  not  be  enough  information  at  the  set  of 

conditions  represented  by  X  =  Xj  to  provide  a  test  of  either  Hq  or  Hq  with 

reasonable  power.  In  this  case  it  would  be  reasonable  to  apply  a  test  at 
each  set  of  conditions  and  then  use  some  method  for  combining  independent 
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tests.  One  well  known  method  of  combining  independent  test  was  developed 
by  Fishier  (1938).  Let  T*  be  the  test  that  is  applied  at  the  ith  set  of  condi¬ 
tions  and  let  a±  represent  the  observed  significance  level  of  the  test, 
i.e.  a*  is  the  probability  of  a  value  of  Ti  that  is  as  extreme  or  more  extreme 
than  the  observed  value  of  If  the  distribution  of  T*  is  continuous  then 

the  distribution  of  a*  is  uniform  on  (0,1)  when  the  null  hypothesis  is  true. 
From  this  it  can  be  shown  that  -2  £  logoi  has  a  chi-square  distribution  with 
2n  degrees  of  freedom  when  the  null  hypothesis  is  true.  When  the  a±  are  small, 

-2  E  logai  will  be  large  and  Fisher's  test  rejects  the  null  hypothesis  when 
i=l 

-2  E  logai  exceeds  an  appropriate  critical  value  from  the  chi-square  table. 
i=l 

For  other  methods  of  combining  independent  tests  see,  for  example,  Osterhoff 
(1969).  Alternately,  a  procedure  such  as  the  analysis  of  variance  could 
be  used  to  combine  information  if  the  usual  assumptions  such  as  equality 
of  variances  at  the  different  conditions  are  reasonable. 

7.  CHOICE  OF  A  TOST 

For  testing  Hq  a  test  such  as  the  two-sample  Ko lmo gorov - Smirnov  test 
for  the  equality  of  two  distribution  functions  could  be  used.  This  test 
could  be  applied  to  Y±  and  2^  at  each  set  of  conditions  and  then  informa¬ 
tion  from  all  tests  could  be  combined  together.  This  type  of  test  has 

the  disadvantage  that  it  is  designed  for  the  very  general  alternative 
F( • |x)  f  G( • j x )  for  some  x  e  A  and  thus  may  not  have  high  power  for  specific 
alternatives  that  may  be  of  primary  interest. 

For  testing  various  parametric  and  nonparametric  tests  could  be  used. 
If  normality  and  constant  variance  can  be  assumed  then  the  analysis  of 

variance  is  a  reasonable  choice  where  there  are  two  treatments  (real  and 
simulation)  and  n  blocks  corresponding  to  the  n  sets  of  conditions.  If  con- 
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stant  variance  can  not  be  assumed  then  individual  two-sample  t-statistics  can 
be  computed  at  each  point  and  then  combined  into  an  overall  test.  If  Hq  is 
rejected  then  the  individual  t-statistics  would  be  useful  in  indicating 
places  where  the  model  does  not  work  well. 

If  normality  can  not  be  assumed  then  two -sample  nonpar ame trie  tests 
such  as  the  Wilcoxon  rank  sum  test  can  be  used  at  each  point  and  combined 
into  an  overall  test.  In  many  applications  data  on  the  real  system  may  be 
scarce  and  there  may  be  only  one  real  observation  Yj_j_  at  each  x^.  In  this 
special  case  let  Rj_  be  the  rank  of  Yii  among  the  set  Yix,ZiltZi2» • • • ^im^. 
Then,  under  the  null  hypothesis,  the  distribution  of  R^  is  uniform  on 
1,2 , . . . ,mi+l.  It  is  then  possible  to  develop  simple  nonparametric  tests  using 
r1»r2* • • ♦ »Rn  (see  Reynolds,  Burkhart  and  Daniels  (1981)). 

8.  OTHER  HYPOTHESIS  TESTING  APPROACHES 

There  is  a  potential  problem  with  testing  Hq  and  Hq  as  previously  formu¬ 
lated.  It  may  be  known  a  priori  that  the  model  and  the  real  system  can  not  be 
identical  and  thus  testing  that  the  two  are  identical  may  not  be  very  helpful. 
A  more  realistic  philosophy  is  to  realize  that  an  imperfect  model  can  still  be 
useful  and  then  try  to  determine  how  "close"  the  model  needs  to  be  to  the  real 
system  in  order  for  the  model  to  be  useful  for  its  intended  purpose.  Once 
this  is  determined  the  validation  data  can  be  used  to  test  the  null  hypothesis 
that  the  model  and  system  are  close  enough  for  the  intended  application  of 
the  model  (see,  for  example,  Balci  and  Sargent  (1981)),  This  approach 
requires  that  a  measure, say  A(x),  of  the  closeness  of  F  and  G  be  developed. 

For  example,  this  measure  could  be  A(x)  =  E( Y  -  Z|x),  the  expected  difference 
between  the  real  system  output  and  the  model  output.  The  null  hypothesis 
could  then  be 
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Hq  s  Mx)  <  >>0  for  a11  x  e  A 

or,  if  the  required  agreement  between  the  real  system  and  model  depends 
on  X,  the  null  hypothesis  could  be 

hq"s  M?)  A0(x)  for  all  x  &  A 

where  X0(x)  is  the  required  agreement  at  X  =  x 

♦  I  *  1  ’ 

In  order  to  test  HQ  or  Hq  an  appropriate  test  statistic  must  be 
chosen.  Balci  and  Sargent  (1981)  discuss  the  use  of  Hotelling's  two-sample 
T2  test  for  this  problem  when  several  system  response  variables  are  observed 
and  the  inferences  are  not  conditional  on  X. 

The  hypothesis  testing  approaches  discussed  so  far  have  all  tested  the 
null  hypothesis  that  the  model  is  "valid"  in  some  sense.  With  this  formula¬ 
tion  the  null  hypothesis  that  the  model  is  valid  will  be  accepted  unless 
there  is  strong  evidence  to  the  contrary.  This  may  lead  to  the  acceptance 

of  a  model  that  is  not  adequate  if  the  power  of  the  test  being  used  is  low. 

This  problem  can  be  overcome  somewhat  if  the  power  of  the  test  at  alter¬ 
natives  of  interest  can  be  ejqplicitly  controlled. 

Another  approach  that  might  be  more  reasonable  from  the  model  users 
point  of  view  is  to  take  the  null  hypothesis  as  the  hypothesis  that  the 
model  is  not  valid.  This  null  hypothesis  would  then  be  rejected  and  the 
model  accepted  only  if  there  is  strong  evidence  that  the  model  is  valid. 

In  this  way  the  burden  of  proof  is  on  the  model  to  prove  itself  before 

being  accepted  for  use.  This  approach  may  be  difficult  to  implement  in  some 
cases  since  the  null  hypothesis  of  an  invalid  model  may  be  difficult  to 
explicitly  formulate  and  test.  Reynolds  (1984)  discusses  this  approach  to 
formulating  the  null  hypothesis  in  one  particular  context. 
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9.  ESTIMATING  ERROR 


The  logical  inconsistency  in  testing  the  null  hypothesis  that  the  model 
output  has  the  same  distribution  as  the  system  output  when  this  is  known  to  be 
impossible  has  already  been  pointed  out.  Testing  the  hypothesis  that  the 
model  is  close  enough  for  the  intended  purpose  of  the  model  may  be  more 
realistic,  but  there  may  be  problems  in  implementing  this  approach.  In  many 
cases  there  will  be  many  potential  users  of  the  model.  Even  if  these  users 
can  be  identified  it  may  be  difficult  to  get  these  users  to  accurately 
specify  the  required  degree  of  agreement  between  the  model  and  the  real 
system.  In  addition,  the  results  of  a  test  may  not  give  the  model  user  much 
feel  for  the  error  that  can  be  expected  when  the  model  is  used  to  draw 
inferences  about  the  real  system. 

One  way  around  the  problems  of  the  hypothesis  testing  approach  is 
through  the  approach  of  what  could  be  called  statistical  estimation.  This 
approach  is  concerned  with  estimating  the  error  that  is  likely  to  result  when 
the  model  is  used  to  estimate  a  parameter  or  to  predict  the  actual  output  of 
the  real  system.  When  the  objective  is  to  estimate  a  parameter  then  a 
confidence  interval  could  be  given  for  the  difference  (expected  error) 
between  the  mean  of  the  estimator  from  the  model  and  the  actual  value  of  the 
parameter.  When  the  objective  is  to  predict  actual  system  output  in  a  given 
situation  then  a  prediction  interval  for  the  difference  (prediction  error) 
between  the  prediction  and  the  observed  output  could  be  calculated.  In  this 
way  estimates  of  error  can  be  used  by  the  model  user  or  users  to  determine 
whether  the  performance  of  the  model  is  acceptable  for  various  purposes. 

The  expected  output  of  the  system  at  X  -  xi  is  E(Y|Xi),  the  expected 
model  output  is  E(Z|xj ),  and  the  expected  difference  or  bias  in  the  model 
is  E(Y  -  Z|xj_).  An  unbiased  estimator  of  this  bias  is  =  Y*  -  Z±.  A 
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confidence  interval  for  this  model  bias  can  be  constructed  to  give  the  model 
user  son®  indication  of  the  average  error  that  will  result  when  the  model  is 
used  to  estimate  the  mean  response  of  the  system.  If  and  are  not  too 
small  then  confidence  intervals  for  bias  at  each  point  can  be  constructed. 

In  some  cases  the  objective  may  be  to  predict  actual  system  output  at 
some  point.  If  Z±  is  considered  as  a  predictor  of  Y*  then  the  prediction 
error  is  D*  =  Yj_  -  Z*.  A  prediction  interval  for  this  error  can  be  con¬ 
structed  to  give  the  model  user  some  indication  of  the  size  of  the  error 
that  may  result  when  the  model  is  used  for  predicting  the  response  of  the 
system. 

If  the  n  sets  of  conditions  can  be  considered  as  a  random  sample 
from  some  population  then  the  n  values  ,02 , . . . ,Dn  can  be  used  to  construct 
a  confidence  interval  for  the  average  bias  (averaged  over  the  distribution 
of  X)  or  to  construct  a  prediction  interval  for  the  prediction  error  at  a 
randomly  selected  value  of  X.  Reynolds  (1984)  discusses  the  use  of  confi¬ 
dence  interval  and  prediction  intervals  in  validating  models. 

10.  REGRESSION 

In  most  cases  the  difference  between  the  model  and  the  real  system  will 
not  be  constant  but  instead  will  vary  depending  on  the  values  of  the  input 
variables.  This  means  that  the  bias  in  the  model  and  the  distribution  of  the 
prediction  error  will  depend  on  X.  In  addition  the  accuracy  required  of  the 
model  may  also  depend  on  X.  For  example,  for  certain  values  of  X  the  value  of 
Y  may  be  large  and  the  acceptable  error  may  also  be  relatively  large .  But  for 
other  values  of  X  the  value  of  Y  may  be  small  and  the  acceptable  error  may 
also  be  relatively  small.  Thus  it  would  be  useful  to  be  able  to  directly 
relate  the  error  or  bias  in  the  model  to  the  levels  of  the  input  variables 
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X.  One  reasonable  approach  to  this  problem  is  to  use  regression  methodology 
to  relate  the  error  D  to  the  input  variables  X.  if  this  can  be  done  then 
model  users  can  obtain  information  about  model  accuracy  for  different  condi¬ 
tions.  In  this  case  estimates  of  bias  or  prediction  error  would  not  be 
restricted  to  the  n  validation  data  points  although  the  regression  model  for 
error  as  a  function  of  X  would  presumably  only  be  valid  within  the  region  of 
the  validation  data.  Reynolds  and  Chung  (1985)  discuss  the  use  of  regression 
methodology  in  validating  models  and  give  an  example  of  this  methodology 
applied  to  the  stand  simulator  PTAEDA. 
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DISTRIBUTION  UNDER  DEPENDENCE  OF  NONPARAMETRIC  TWO-SAMPLE  TESTS 
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ABSTRACT .  This  paper  aims  to  show  how  to  develop  the  theory  of 
two-sample  statistical  procedures  in  a  way  that  enables 
statisticians  to  determine  (in  a  practical  and  effective  way) 
how  tests  can  be  adjusted  for  dependence  in  the  case  that 

dependence  is  modelled  by  a  stationary  time  series.  The 

importance  of  the  problem  of  adjusting  two-sample  tests  for 

dependence  is  illustrated  by  an  example  from  Box,  Hunter,  and 

Hunter  (1978).  The  paper  concludes  with  a  formula  for 
dependence  factors  of  linear  rank  statistics  which  are  expressed 
in  terms  of  spectral  densities  at  zero  frequency  of  suitable 
rank  transformed  time  series.  To  derive  dependence  factors,  we 
use  the  asymptotic  distribution  theory  of  sample  distribution 
functions  and  sample  quantile  functions  of  stationary  time 
series.  Proofs  of  these  results  and  examples  of  their 
applications  are  given  by  A.  Harpaz  (1985)  in  his  Ph.D.  thesis. 

1 .  INTRODUCTION 

Serial  dependence  (autocorrelation)  in  data  can  seriously 
affect  the  performance  of  standard  statistical  procedures  (such 
as  the  t-test  or  Wilcoxon  rank  sum  test  for  the  equality  of 
location  parameters  of  two  samples) .  The  qualitative  truth  of 
this  statement  is  well  known  to  statisticians.  But  general 
techniques  for  evaluating  quantitatively  the  properties  of 
standard  statistical  procedures  under  dependence  are  not  being 
used  by  statisticians.  This  paper  aims  to  show  how  to  develop 
the  theory  of  two-sample  statistical  procedures  in  a  way  that 
enables  statisticians  to  determine  (in  a  practical  and  effective 
way)  dependence  factors  which  adjust  tests  in  the  case  that 
dependence  is  modelled  by  a  stationary  time  series. 

To  illustrate  and  motivate  the  importance  of  the  problem  of 
adjusting  two-sample  tests  for  dependence  we  quote  an  example 
presented  by  Box,  Hunter,  and  Hunter  (1978,  pp.  81-82).  An 
experiment  is  performed  which  takes  two  samples  of  10 
observations  each  from  identical  populations  and  tests  for  a 
change  in  location  by  a  t  test  and  a  Wilcoxon  test  using  a  5% 
level  of  significance.  This  experiment  was  repeated  1000  times 
and  one  observed  the  percentage  P  of  the  number  of  experiments 
in  which  the  null  hypothesis  of  equality  of  distributions  is 
rejected.  When  the  samples  of  size  10  consist  of  independent 
observations  one  expects  that,  and  observes  that,  approximately 
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P=5%.  The  experiment  also  simulated  observations  with  errors 
e(t)  generated  from  white  noise  u(t)  by  a  first  order  moving 
average  model  e(t)=u(t)+bu(t-1 ) ,  with  b  chosen  so  that  the  lag 
one  autocorrelation  rho  equaled  -.4  (negative  autocorrelation) 
or  .4  (positive  autocorrelation) .  Under  these  conditions  the 
values  observed  for  P  were  very  approximately  P=11%  for  rho= . 4 
and  P=0 . 2%  for  rho=-.4.  One  would  like  to  be  able  to  compute 
theoretical  values  of  P  which  can  be  compared  with,  and  help  us 
understand  and  predict,  the  observed  values  of  P.  The  formulas 
given  in  this  paper  show  that  the  theoretical  values  of  P  depend 
in  large  samples  on  the  value,  denoted  f(0),  at  zero  frequency 
of  the  spectral  density  function  of  the  time  series  model 
describing  the  dependence  of  the  observations. 

For  a  first  order  moving  average  f(0)  =  1+2*rho,  so  that 
f (0)=1 .8  for  rho= . 4  and  f(0)=.2  for  rho=- . 4 ;  note  that  f(0)=1. 
for  white  noise  (rho=0.).  These  values  of  f(0)  can  be  used  to 
compute  theoretical  values  of  P  (based  on  sampling  theory  for 
dependent  data)  which  are  in  rough  accord  with  the  values  of  P 
observed  by  Box,  Hunter,  and  Hunter  in  their  experiment.  The 
conclusion  drawn  by  Box,  Hunter,  and  Hunter  from  their 
experiment  is  that  the  significance  levels  of  the  t  and  Wilcoxon 
tests  are  affected  remarkably  little  by  dramatic  changes  in  the 
probability  distribution  (normal,  uniform,  skewed)  but  are 
seriously  impaired  by  serial  dependence.  To  resolve  the  problem 
of  dependent  errors  one  approach  is  to  avoid  dependence  through 
randomization.  But  when  serial  dependence  cannot  be  avoided  its 
effect  must  be  assessed  quantitatively.  This  paper  describes 
methods  for  adjusting  (for  time  series  dependence)  two-sample 
linear  rank  tests  to  have  known  sampling  distribution  under  the 
null  hypothesis. 

As  an  example,  let  us  note  that  the  z-statistics  in  eq . 
(3.29)  or  the  t-statistic  in  eq  (3.33)  of  Box,  Hunter,  and 
Hunter  (1978)  could  be  approximately  adjusted  for  serial 

dependence  by  dividing  by  (f(O))1^2.  This  formula  generalizes 
the  discussion  on  p.  588  of  Box,  Hunter,  and  Hunter  (1978). 

[When  f(0)  =  .2,  its  square  root  is  .45.  The  adjusted 
t-statistic  1.01/. 45  =  2.26  or  adjusted  t-statistic  .88/. 45  = 
1.96  yield  P-levels  comparable  to  that  of  the  t-value  2.17 
obtained  in  eq.  (2.16)]. 

2.  LINEAR  RANK  STATISTICS  DEPENDENCE  FACTORS 

Let  X ( 1 ),..., X(m)  be  a  sample  from  a  strictly  stationary 
time  series  with  distribution  function  F(x)  =  PROB[X<x],  -«<xO, 
and  quantile  function 

Q(u)  -  F  1  (u)  =  inf  (x:  F(x)2u),  0<u<.1  . 

The  population  mean  and  variance  of  X  are  denoted  MX  and  VARX. 

The  sample  mean  and  variance  of  X( 1 ) , . . . ,X(m)  are  denoted  MX(m) 
nnd  VARX(m) . 
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Let  Y( 1 Y(n)  be  a  sample  from  a  strictly  stationary 
time  series  with  distribution  function  G(x)  =  PROB[  Y_<x  ], 

<x<°°  and  quantile  function  G-1  (u)  .  Assume  that  X  values  are 
independently  distributed  from  Y  values. 

Let  T  denote  a  linear  rank  statistic  to  test  the  null 
hypothesis  Ho  of  equality  of  the  distributions  F(x)  and  G(x) . 

To  compute  and  represent  T  one  introduces  the  rank,  denoted 
R j ,  of  the  j-th  largest  X  value  within  the  pooled  sample  of  X 
and  Y  values.  A  typical  definition  of  T  is 

m 

(1)  T=  (1/m)  E  J(R • / (N+1 ) ) 

j  =  1  3 

where  N=m+n  is  the  pooled  sample  size  and  J(u),  0<.u<.  1 ,  is  a 
suitable  score  function.  The  Wilcoxon  rank-sum  test  corresponds 
to  J(u)=u  or  J(u)=u-0.5. 

The  asymptotic  distribution  of  T  under  the  null  hypothesis 

-t 

Ho  can  be  described  in  terms  of  A=m/N,  MJ(U)  =  /  J(u)  du,  and 

VARJ(U)  =  (J(u)  -  M J ( U ) } 2  du. 

The  role  of  U  will  become  clear  in  the  sequel  (section  4);  it 
represents  a  random  variable  with  a  uniform  distribution  on  the 
interval  0  to  1 .  This  paper  shows  how  to  express  the  asymptotic 
distribution  of  T,  as  N  tends  to  «>,  in  the  form 

/N{T  -  MJ(U)>  is  NORMAL ( 0 ,  ( ( 1 -A ) / A ) *VARJ ( U ) *DEPFAC [T] ) 

The  notation  *  denotes  multiplication 

We  use  DEPFACCT]  to  denote  dependence  factor  of  T;  it 
equals  1  if  the  X's  are  independent  random  variables  and  Y's  are 
independent  random  variables.  The  main  aim  of  this  paper  is  to 
present  a  formula  for  the  dependence  factor  DEPFAC[T]  of  a 
linear  rank  statistic  T.  To  adjust  T  for  dependence  we  could 

use  (T-MJ(U) ) / (DEPFAC[T] } as  our  test  statistic. 

To  help  interpret  and  understand  the  formula  we  present  at 
the  end  of  the  paper  for  DEPFACfT]  the  next  section  introduces 
dependence  factors  for  sample  means. 

3.  DEPENDENCE  FACTORS  AND  SPECTRAL  DENSITIES  AT  ZERO  FREQUENCY 
Our  notation  for  the  theoretical  mean  and  variance  of  a 
random  variable  X  is  MX=E[X]  and  VARX  =  E[{X-MX}2].  when  X(t), 
t=0,±1 ,±2, ... ,  is  a  stationary  time  series  its  covariance 
function  is  denoted  R(v;X)  =  COV[X(t) ,X(t+v) ]  and  its 
correlation  function  is  denoted 
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RHO(V;X)  =  R(V;X)/R(0;X)  =  CORR[X( t) ,X(t+v) ] , 

V=0, +  1 ,+2,  . .  . 

The  sample  mean  of  X(1)f...fX(n)  is  denoted 

n 

MX(n>  =  (1/n)  E  X ( t ) 
t=1 


The  variance  of  a  sample  mean  can  be  expressed 
n  VAR [MX {n} ]  =  VARX*DEPFAC[MX(n}] 

where 


n 

DEPFAC[MX (n)  ]  =  E  (1-|v/n|)  RHO(v;X) 

v=-n 

In  words,  the  variance  of  the  sample  mean  of  a  stationary  time 
series  can  be  represented  as  the  product  of  its  variance  for  an 
independent  sample  and  a  dependence  factor. 

For  large  samples  (as  n  tends  to  <*>)  one  can  relate  the 
dependence  factor  to  the  spectral  density  of  the  time  series, 
denoted 


SPECDEN { w ; X )  =  1  +  2  E  RHO(v:X)  COS  2irwv,  (Kuril  . 

v=1 


For  n  large,  the  dependence  factor  of  a  sample  mean  is  given  by 
DEPFAC[MX(n) ]  =  SPECDEN (0 ; X ) 

The  advantage  of  expressing  the  dependence  factor  in  terms  of 
the  spectral  density  at  zero  frequency  is  that  it  can  be 
estimated  using  methods  of  spectral  density  estimation. 

Let  us  now  consider  the  two-sample  problem  of  testing  the 
equality  of  distributions  of  two  independent  time  series  X(t) 
and  Y(t)  using  as  a  test  statistic  the  difference  of  the  sample 
means 

m  n 

MX{m>  =  (1/m)  E  X(t),  MY{n>  =  (1/n)  E  Y(t) . 

t=1  t=1 

The  test  statistic  MX{m}-MY(n)  has  variance  equal  to  the  sum  of 
the  variances  of  the  two  sample  means .  Therefore  approximately 

VAR[MX(m)-MY(n) ]  =  ( 1/m)VARX*SPECDEN(0;X)+( 1/n)VARY*SPECDEN(0;Y) 
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Assume  that  under  H0  both  MX=MY  and  VARX=VARY  (in  practice,  one 
might  replace  VARX  and  VARY  by  the  variance  of  the  pooled 
sample).  Then  under  Ho  MX(m}-MY(n)  has  mean  0  and  variance 

(1)  VAR [MX ( m } -MY { n } ]  =  (NA ( 1 -A ) } " 1 *VARX*DEPFAC[MX(m> -MY (n) ) 

where  N=m+n,  A=m/N,  and  the  dependence  factor  can  be  expressed 
approximately  (for  large  values  of  m  and  n)  in  terms  of  spectral 
densities : 

(2)  DEPFAC[MX(m} -MY (n> ]  =  ( 1 -X )  SPECDEN(0;X)  +  A  SPECDEN (0; Y) 

It  should  be  noted  that  we  are  not  assuming  that  the  spectral 
densities  of  X  and  Y  are  equal . 

This  formula  for  the  dependence  factor  of  the  difference  of 
two  means  is  important  for  several  reasons: 

( 1 )  It  can  be  used  to  determine  the  affect  of  dependence 
on  the  two  sample  t-test;  it  shows  that  the  affect  for  large 
samples  depends  only  on  the  value  of  the  spectral  densities  of 
X(t)  and  Y(t)  at  zero  frquency. 

(2)  It  motivates  the  form  of  answer  which  we  seek  for 
linear  rank  statistics  T,  since  we  shall  show  that  T  -  MJ(U)  has 
the  same  distribution  as  a  dif f erence-of-means  statistic 

( 1 -A )  (MJ (UX~ ) (m)  -  MJ(UY~)  (n)) 

in  terms  of  time  series  JfUX'ft))  and  J(UY~(t))  defined  below. 
The  asymptotic  variance  of  T  therefore  can  be  expressed,  using 
(1)  and  (2), 

NA'il  -  A  j  *  VAR  J  ( UX )  *  {  (  1  -  A  )  SPECDEN  ( 0 ;  J  ( UX~  ))  +  A  SPECDEN  ( 0  ;  J  (  UY'“ ))  } 

The  remarkable  conclusion  which  one  is  able  to  draw  from 
this  formula  is  that  for  large  samples  the  dependence  factor  of 
linear  rank  statistics  can  be  evaluated  by  estimating  the 
spectral  density  at  zero  frequency  of  the  derived  time  series 
J ( UX~  ( t )  )  and  J(UY'v(t)).  Experience  indicates  that  a  quick  and 
dirty  estimate  of  these  spectral  densities  is  provided  by  the 
spectral  densities  of  X(t)  and  Y(t)  respectively.  In  practice 
one  will  not  know  the  dependence  structure  of  the  errors.  The 
dependence  factor  of  T  will  be  estimated  by  estimating  the 
spectral  density  at  zero  frequency  of  the  time  series  whose 
means  are  being  compared . 


4.  REPRESENTATIONS  OF  LINEAR  RANK  STATISTICS 

To  study  linear  rank  statistics  we  use  representations  for 
them  in  terms  of  sample  distribution  functions  which  are  valid 


for  both  independent  and  dependent  observations. 
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A  sample  X( 1 ) , . . . ,X(m)  has:  order  statistics 
X(1;m)l...<X(m;m);  sample  distribution  function  F~(x)  =  fraction 

- 1 

of  sample  <.  x;  and  sample  quantile  function  Q"“(u)  =  F"  (u) 
given  by 

Q~(u)  =  X  (  j ; m)  for  (  j-1 )  /m<u<.j/m. 

One  also  uses  continuous  versions  of  the  discrete  sample 
quantile  function.  A  sample  Y ( 1 Y (n )  has:  order  statistics 
Y{  1  ;n)<_.  .  .^.Y(n;n)  and  sample  distribution  function  G~(x). 

One  pools  the  two  samples  to  form  a  pooled  sample 
X( 1 ) , . . . ,X(m) ,  Y( 1 Y(n)  of  size  N=m+n  which  has  sample 
distribution  function  H^Cx)  satisfying  H" (x)=AF~ (x)  + 

( 1 -A )G~ (x ) .  The  limit  of  H~(x)  is  H(x)  =  AF(x)  +  (l-A)G(x). 

In  the  one-sample  problem  we  call  U(t)  =  F(X(t)), 
t=1,...,m,  the  rank  transformed  variables;  their  marginal 
distribution  is  uniform  on  0  to  1 .  Sample  rank  transformed 
variables  U~(t)  are  defined  by  a  formula  such  as  U~(t)  = 

(m/(m+1) )F"(X(t) )  which  assigns  ranks  1 / (m+1 ) , . . . ,ra/ (m+1 )  to  the 
order  statistics  X( 1 ;m X (m; m) . 

In  the  two-sample  problem  the  rank  transformed  variables 
are  defined  to  be  H(X(t))  and  H(Y(t) ) .  The  sample  rank 
transformed  variables  are 

UX~(t)  =  (N/(N+1))H"(X(t) ) ,  UY~ { t )  =  (N/(N+1) )H'(Y(t) ) . 

A  linear  rank  statistic  T  as  traditionally  defined  by 
eq  (1)  of  section  2  can  be  represented 

m  N 

T  =  (1/m)  E  H~X  (  j  ;  m )  )  =  MJ(UX")(m)  . 

j  =  1  1 

An  alternative  statistic,  which  our  analysis  shows  provides  more 
insight  into  the  asymptotic  distribution,  is  the  dif f erence-of- 
means  statistic;  one  can  show  that  asymptotically  [and  exactly 
for  J(u)=u] 

T  -  MJ(U)  =  (l-A).(MJ(UX')  {m>  -  MJ(UY~){n>) 

To  relate  T  to  sample  distribution  functions  we  represent  it 

T  -  C.  J<57T  h'(x))  dF~(x) 

Our  approach  is  to  write  approximately 
T  =  J(u)  dF""  (H"“_1  (u)  )  . 
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This  formula  is  not  used.  But  it  suggests  one  should  try  to 
represent  T  exactly  as 

T  =  J(u)  dD" (u) 

where  D~(u)  is  a  suitable  estimator  of  D(u)  =  FH-1(u),  0<u<.  1  . 

We  call  D (u)  a  comparison  quantile  function. 

We  would  like  to  define  D'(u)  in  terms  of  sample 
distribution  functions  so  that  it  is  a  step  function  with  jumps 
equal  to  1/m  at  u=(N/(N+1))  R j .  Parzen  (1983)  shows  that  this 
can  be  accomplished  if  D"“(u)  is  defined  as  the  inverse 
D1“-1(t)  of  D^ft)  =  h~F’''-1  ( t)  ,  0<t<_1. 

Our  motivations  for  introducing  D(u)  and  D"(u)  are  diverse. 

( 1 )  They  implement  our  philosophy  that  every  graph  should 
be  a  picture  of  a  function.  Various  techniques  for  graphical 
analysis  of  samples,  such  as  P-P  plots  and  Q-Q  plots,  can  be 
regarded  as  sample  versions  of  theoretical  functions  of  the  form 
of  D(u), 

(2)  The  conclusions  that  one  obtains  arithmetically  from 
the  value  of  a  linear  rank  statistic  can  often  be  discovered 
graphically  (at  a  glance)  from  a  graph  of  D^fu). 

(3)  In  cases  where  the  value  of  T  indicates  no  significant 
difference  between  the  two  samples,  the  graph  of  D~(u)  may 
indicate  important  ways  in  which  the  samples  differ. 

(4)  The  empirical  process  D~(u)  is  important  as  a 
practical  basis  for  data  analysis  (as  outlined  in  reasons  (2) 
and  (3))  and  as  a  theoretical  basis  for  deriving  the  properties 
of  linear  rank  statistics.  The  asymptotic  distribution  of 
D"(u),  0<.u<.1  ,  is  derived  by  expressing  it  in  terms  of  the 
asymptotic  distributions  of  the  sample  distribution  functions  of 
the  independent  stationary  time  series  X(t)  and  Y(t).  The 
rigorous  theory  of  the  latter  has  recently  been  completed  by 
Pham  and  Tran  (1985)  as  the  culmination  of  a  long  line  of 
research  papers  starting  with  the  pioneering  work  of  Gastwirth 
and  Rubin  (1975) . 

5.  EMPIRICAL  PROCESSES  OF  STATIONARY  TIME  SERIES 

Let  F~(x)  and  Q"(u)  denote  the  sample  distribution  and 
sample  quantile  function  of  X( 1 ) , . . . ,X(n) ,  a  sample  from  a 
stationary  time  series  X(t)  .  Let  CFX(x),  -°»<xO,  and 
CF-1X(u),  0<u<.1 ,  denote  stochastic  processes  representing  the 
limiting  distributions  of  /n(F'v  (x)-F(x) ) ,  -«°<xO,  and 

/n{F-1  (u)-F'1  (u) )  ,0<.ui.1 ,  respectively.  One  can  show  that 
there  is  a  zero  mean  Gaussian  stochstic  process  denoted  BX(u), 
0<.u<.1 ,  such  that 
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CFX(x)  =  BX ( F ( x ) ) ,  CF"1(u)  =  {-1/fF""1  (u)  >  BX(u) 

Thus  the  asymptotic  distribution  of  the  sample  distribution  and 
sample  quantile  functions  can  be  expressed  in  terms  of  the 
process  BX(u),  0.<u<1  . 

For  independent  random  variables  (white  noise)  X(t),  the 
limit  process  BX(u)  is  a  Brownian  Bridge,  which  is  a  zero  mean 
Gaussian  process  with  covariance  kernel 

E[BX(u^  )  BX(u2)]  =  u^  M-U2)  for  u1  <,  u2 

An  indication  of  the  formulas  required  to  describe  BX(u) 
when  X ( t )  is  a  time  series  is  provided  by  the  limit 
distributions  of  independent  samples  of  bivariate  dependent 
random  variables  (X(t) ,  Y(t)).  Then  the  limit  processes  BX(u) 
and  BY(u)  are  each  Brownian  Bridges  but  they  are  not  independent 
of  each  other.  They  have  joint  covariance  kernel 

E[BX(u1)  BY(u2)]  =  FCQXCu^  )  ,QY(u2)  )  -  u^u2'  0_<u1fu2_<  1/ 

where  F(x,y)=PROB[X_<x, Y<.y]  is  the  joint  distribution  function  of 
X  and  Y.  We  call  F(QX(u-j),  QY(U2))  the  bivariate  dependence 
function  of  X  and  Y;  an  alternative  name  (used  by  some  authors) 
is  copula . 

To  express  the  covariance  kernel  of  BX(u),  0<u<.1 ,  in  the 
case  that  X(t)  is  a  stationary  time  series,  it  is  more 
convenient  (for  insight  and  computation  and  to  avoid  a 
complicated  infinite  summation  of  bivariate  dependence 
functions)  to  represent  the  covariance  structure  as  a  formula 

for  the  variance  of  a  general  linear  functional  g(u)  dBX(u) 

for  suitable  functions  g(u).  Let  CJ(t)=F(X(t))  be  the  rank 
transform,  and  form  the  time  series  g(U)  whose  value  at  t  is 
g(U(t)).  Equivalently  we  write  g(U)=g(F(X) ) . 

BASIC  THEOREM  ON  EMPIRICAL  PROCESS  OF  STATIONARY  TIME 
SERIES  =  The  distribution  of  BX(u),  0<u<.1 ,  can  be  described  in 
terms  of  the  spectral  density  at  zero  frequency  of  the  time 
series  gU(t),  U(t)=F(X(t) ) ,  which  are  estimated  by  gU"(t),  U~(t) 
=  F~ ( X ( t ) ) : 

VAR[J^  g(u j  dBX(u) ]  =  VARg(U)  SPECDENfO; g(U) ) 

where 

VARg(U)  =  g2 (u )  du  -  jj^  g(u)du|2  . 

The  asymptotic  distribution  of  linear  rank  statistics  are 
obtained  from  formulas  for  the  asymptotic  distribution  of  linear 
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functionals  in  the  sample  comparison  quantile  function  D^u), 
0<u< 1 ,  defined  in  section  4.  One  can  show  that  (in  the  sense  of 
convergence  of  stochastic  processes) 

/N  {D~  (u)-D(u)  )  CD  ( u ) 

where  the  limit  process  CD(u,0<u<1,  can  be  expressed  in  terms  of 
independent  limit  processes  BX(u),  0<u<1,  and  BY(u),0<u<1,  by 

CD { u )  =  -( 1-A) U“1/2BX(u)  -  (1-A)~1/2BY(u)> 

The  processes  BX(u)  and  BY(u)  are  related  to  the  processes 
defined  in  the  Basic  Theorem  on  Empirical  processes.  Their 
covariance  kernels  are  expressed  in  terms  of  the  spectral 
densities  at  zero  frequency  of  the  time  series  J(UX~(t})  and 
J ( UY~ ( t ) )  : 

i 

VAR[J0  J(u)  dBX(u) ]  =  VARJ(U)  SPECDEN ( 0 : J ( UX" ) ) , 

VAR[J^  J(u)  dBY(u)]  =  VARJ(U)  SPECDEN(0; J(UY~ ) ) . 

By  combining  all  these  results  one  can  obtain  the  formula 
given  in  section  2  for  the  asymptotic  distribution  of  a  linear 
rank  statistic  for  two  samples  from  stationary  time  series  with 
dependence  factor  DEPFAC[T]  estimated  by 

DEPFAC[T]  =  (1-A)  SPECDEN ( 0 ; J ( UX' ) )  +  A  SPECDEN ( 0 ; J ( UY'“ ) ) 

A  more  complete  proof  of  this  result,  and  examples  of  its 
applications,  are  given  by  Harpaz  (1985)  in  his  Ph.D.  thesis. 
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ABSTRACT 

Clustering  of  individuals,  segmentation  of  time  series  and 
segmentation  of  numerical  images  can  all  be  considered  as  labeling 
problems,  for  each  can  be  described  in  terms  of  pairs  (xj-»gt)  ,  t  = 
l,2,...,n,  where  is  the  observation  at  instance  t  and  gt  is 

the  unobservable  "label"  of  instance  t.  The  labels  are  to  be 
estimated,  along  with  any  unspecified  distributional  parameters.  In 
cluster  analysis  the  values  of  t  are  the  individuals  (cases)  observed 
and  the  x's  are  independent.  In  time  series  the  values  of  t  are  time 
instants  and  there  is  temporal  correlation.  In  numerical  image 
segmentation  the  values  of  t  denote  picture  elements  (pixels)  and 
spatial  correlation  between  neighboring  pixels  can  be  utilized.  The 
idea  in  segmentation  is  that  signals  and  time  series  often  are  not 
homogeneous  but  rather  are  generated  by  mechanisms  or  processes  with 
various  phases.  Similarly,  images  are  not  homogeneous  but  contain 
various  objects.  "Segmentation"  is  a  process  of  attempting  to  recover 
automatically  the  phases  or  objects.  A  labeling  model  for  representing 
such  signals,  time  series,  and  images  was  discussed  in  a  paper  by  the 
present  author  in  the  Proceedings  of  the  30th  Conference;  some 
approaches  to  estimation  and  segmentation  in  this  model  were  presented. 
The  present  paper  summarizes  the  work  on  all  these  types  of  labeling 
problems,  clustering  as  well  as  time  series-  and  image-segmentation. 

Key  words  and  phrases:  statistical  pattern  recognition, 

classification;  temporal  correlation,  spatial  correlation;  optimization 
by  relaxation  method. 
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1 .  Introduction 

The  research  reported  here  relates  to  cluster  analysis  and 
numerical  processing  of  time  series  and  images.  It  is  in  part  a 
discussion  of  work  performed  under  ARO  Contract  DAAG29-82-K-0 155 
(6/15/82  -  6/15/85)?  Statistical  Models  and  Methods  for  Cluster 

Analysis  and  Image  Segmentation.  The  type  of  datasets  to  which  the 
techniques  developed  are  applicable  include:  signals  such  as  radar  and 
sonar;  economic  and  bio-medical  time  series;  time  series  arising  from 
quality  assurance  acceptance  sampling  by  attributes  or  variables;  and 
digital  images  which  can  result  from  various  sources,  including 
bio-medical  imagery,  infrared  imagery  obtained  by  smart  munitions, 
and  mul tispectral  data  obtained  by  satellite.  The  problems  addressed 
are  those  of  clustering,  and  segmentation  of  time  series  and  images. 

The  work  involves  the  further  development  of  algorithms  for 
clustering  large,  multidimensional  datasets  and  for  segmentation  of 
time  series  and  digital  images.  The  algorithms  are  based  on  maximum 
likelihood  estimation  in  distribution-mixture  models.  In  the  context 
of  these  mixture  models  clustering  is  construed  as  estimation  of 
unobserved  labels.  An  observation's  label,  were  it  observable,  would 
tell  from  which  mixture  component  the  observation  arose.  image 
segmentation  is  also  considered  as  a  labeling  problem.  Throughout  the 
work  there  is  an  attempt  to  apply  model -selection  criteria  to  the 
decision  as  to  an  appropriate  number  of  clusters  or  classes  of  segment. 

Software  development  is  an  important  aspect  of  such  a  project. 
The  algorithms  developed  are  programmed  in  FORTRAN. 

Some  of  the  ideas  discussed  in  the  present  paper  have  been 
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developed  and  published  in  journals;  see  Sclove  (1977*  1983a*b*cs 
1984a)  and  Bozdogan  and  Sclove  (1984).  r'opis 

The  organization  of  the  present  paper  is  as  follows:  Section  2 
concerns  cluster  analysis;  in  this  section  there  is  some  general 
discussion  of  model-selection  criteria  and  a  digression  to  mention  some 
ideas  concerning  clustering  of  variables.  Section  3  summarizes  some  of 
the  results  on  time-series  segmentation,  and  results  on  image 
segmentation  are  discussed  in  Section  4. 

2.  Cluster  analysis 

Background.  The  mixture  model  for  the  clustering  problem 
postulates  a  mixture  of  k  distributions.  This  is  the  approach  put 
forth  in  (Sclove  1977)  •  The  research  problem  set  there  was,  at  least 
in  part,  to  see  whether  the  ISOOATA  (Ball  and  Half,  1 9&7)  and  K-MEANS 
(MacQueen,  1967)  algorithms  could  be  interpreted  as 
mathemat ical -stat i st ical  estimation  schemes  in  some  model  for  the 
clustering  problem.  That  is,  did  there  exist  a  model  for  the 
clustering  problem,  and  an  estimation  method  in  that  model,  such  that 
ISODATA  and  K-MEANS  corresponded  to  that  method  applied  to  that  model? 
The  answer,  provided  in  (Sclove  1977)*  was  affirmative;  this  will  be 
explained  below,  but  first  let  us  briefly  define  ISODATA  and  K-MEANS. 

The  "isodata"  scheme  proceeds  as  follows.  One  starts  with 
tentative  estimates  of  cluster  means  as  seed  points  for  the  clusters 
and  assigns  each  observation  to  the  mean  to  which  it  is  closest.  The 
cluster  means  are  then  re-estimated,  and  one  loops  through  the  data 
again,  reass i gning  the  observat ions.  Etc.  In  the  K-MEANS  algorithm, 
the  seed  points  are  updated  immediately  after  each  observation  is 
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tentatively  classified.  In  (Sciove  1977)  ft  was  shown  that  these 
algorithms  correspond  to  iterative  maximum  likelihood  estimation  in  a 
type  of  mixture  model  for  the  clustering  problem,  where  the  component 
distributions  are  multivariate  normal. 

This  clustering  can  be  done  for  various  values  of  k,  the  number  of 
clusters.  Figures  of  merit  can  be  used  to  choose  the  best  k. 
Model-selection  criteria  can  be  used  as  figures  of  merit. 

2.1.  Model -sel ect ion  criteria 

In  the  context  of  a  mixture  model,  choice  of  the  number  of 
clusters  k  can  be  viewed  as  a  model-selection  problem.  However, 
at  least  in  the  case  of  clustering  individuals,  existing 
model-selection  criteria  have  to  be  modified,  as  they  depend  upon 
(regularity)  assumptions  that  are  not  always  met  in  mixture  models 
for  clustering  individuals. 

In  any  case,  let  us  review  some  of  the  existing  model-selection 
criteria.  Consider,  then,  a  problem  of  choosing  from  among  several 
models,  indexed  by  k  (k  *  1 , 2 , . . . , K)  .  Let  L (k)  be  the  likelihood, 
given  the  k-th  model.  Various  model -select ion  criteria  taking  the  form 
-2  log  (max  L  (k) )  +  a  (n)  m  (k)  +  b  (k)  ,  (1) 

have  been  developed  in  relatively  recent  years.  Here  n  is  the  sample 
size,  log  denotes  the  natural  logarithm,  max  L  (k)  denotes  the  maximum 
of  the  likelihood  over  the  parameters,  and  m(k)  is  the  number  of 
independent  parameters  in  the  k-th  model.  For  a  given  criterion,  a  (n) 
is  the  cost  of  fitting  an  additional  parameter  and  b  (k)  is  an 
additional  term  depending  upon  the  criterion  and  the  model  k. 

Akaike  (see,  e.g.,  Akaike  1973*  1 97^* »  1 98 1 )  developed  such  a 
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criterion  as  an  (heuristic)  estimate  of  the  expected  entropy 
(Kul lback-Leibler  information).  Akaike's  information  criterion  (A I C) 
i s  of  the  form  (1)  with 

a  (n)  “  2  for  all  n,  b  (k)  =  0  (A  1C).  (2) 
Schwarz  (1978),  working  from  a  Bayesian  viewpoint,  obtained  a  criterion 
of  the  form  (i)  with 

a  (n)  =  log  n,  b  (k)  =  0  (Schwarz1  criterion).  (3) 
Since,  for  n  greater  than  8,  log  n  exceeds  2,  it  follows  that 
Schwarz'  criterion  favors  models  with  fewer  parameters  than  does 
Akaike's. 

Noting  that  AIC  has  a (n)  a  constant  function  of  n,  namely  2, 
various  researchers,  including  Kashyap  (1982)  and  Schwarz  (1978)  have 
mentioned  that  AIC  is  not  consistent;  a (n)  needs  to  depend  upon  n. 

Kashyap  (1982),  also  working  from  a  Bayesian  approach,  took  the 
asymptotic  expansion  of  the  logarithm  of  the  posterior  probabilities  a 
term  further  than  did  Schwar2  and  obtained  the  criterion  of  the  form 
(1)  given  by 

a  (n)  «  log  n,  b  (k)  «  log(det  B  (k) )  (Kashyap's  criterion),  (4) 
where  det  denotes  the  determinant  and  B (k)  is  the  negative  of  the 
matrix  of  second  partials  of  log  L  (k) ,  evaluated  at  the  maximum 
likelihood  estimates.  in  Gaussian  linear  models  this  is  the  covariance 

matrix  of  the  maximum  likelihood  estimates  of  the  regression 

1 

coefficients;  in  general,  the  expectation  of  B  (k) ,  evaluated  at  the 
true  parameter  values,  is  Fisher's  information  matrix.  Since  Kashyap's 
criterion  is  based  on  reasoning  similar  to  Schwarz',  but  contains  an 
extra  term,  it  may  perform  better.  [Further  comments  on 
model-selection  criteria  are  made  in  Sclove  (1983d).] 
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2.2.  Multi-sample  clustering 

The  problem  of  multi-sample  clustering,  the  grouping  of  samples, 
is  treated  in  Bozdogan  and  Sclove  (1984)  .  The  situation  is  the 
K-sample  problem  (one-way  analysis  of  variance),  with  an  emphasis  on 
grouping  the  samples  into  fewer  than  K  clusters.  The  use  of 
model -se lect i on  criteria  in  this  context  can  provide  an  alternative  to 
multiple-comparison  procedures.  Use  of  model -select ion  criteria  avoids 
the  difficult  choice  of  levels  of  significance  in  such  problems. 
Model -selection  criteria  can  also  be  used  in  this  context  to  decide 
whether  or  not  to  assume  a  common  covariance  matrix.  Kashyap's 
criterion  could  be  evaluated  and  used  for  these  problems. 

2.3.  Clustering  of  individuals 

Schwarz'  and  Kashyap's  criteria  could  be  calculated  for  the 
problem  of  clustering  individuals  according  to  Wolfe's  (1970) 
mixture-model  clustering  approach  and  incorporated  into  computer 
programs  for  clustering.  The  values  of  the  criteria  can  be  used 
heuristical ly  as  figures  of  merit  for  alternative  models,  but  in  order 
to  be  rigorously  applied  the  mode] -selection  criteria  need  to  be 
modified  since  their  derivation  involves  an  assumption  of 
nonsingularity  of  the  information  matrix.  However,  note  in 
this  regard  a  potential  advantage  of  model -select  ion  criteria 
over  a  hypothesis-testing  approach  in  this  and  similar 
situations.  Model -select  ion  criteria  require  nonsingularity  of 
the  information  matrix  only  for  each  fixed  model  k.  The  testing 
approach  runs  into  difficulties  because  of  nonsingularity  of  the 
matrix  at  the  boundary  between  the  null  and  alternative  hypotheses 
(i.e.,  at  the  boundary  between  models). 
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2.4.  Clustering  of  variables 


The  clustering  of 

variables  can  also 

be 

v i ewed 

as  a 

mode  1 -select ion  problem. 

For  example,  whether 

and 

how  to 

c 1  us  ter 

multinomial  variables  depends  upon  which  covariances  may  be  assumed  to 
be  zero;  the  possible  patterns  of  zeros  among  the  covariances  are 
separate  models,  a  figure  of  merit  for  which  is  provided  by  a  suitable 
model -selection  criterion.  This  idea  is  to  be  further  developed. 

3.  Time-series  segmentation 

As  mentioned  above,  a  model  for  clustering  or  segmentation  is 
given  by  assuming  that  each  instance  of  observation,  t,  gives  rise  not 
only  to  an  observation  but  also  to  a  iabel,  gt,  equal  to  1,  2, 

....  or  k,  where  k  is  the  number  of  classes  of  segment. 
Model -selection  criteria  are  used  to  estimate  k.  In  the  context  of 
this  model,  segmentation  is  merely  estimation  of  the  labels.  Sclove 
(1983b, c;  1984a)  treats  the  problem  by  modeling  the  label  process  as 
a  Markov  chain.  An  algorithm  and  computer  programs  are  discussed; 
numerical  examples  are  given. 

The  model  involves  three  sets  of  parameters;  the  distributional 
parameters  (e.g.,  means  and  covariance  matrices),  the  labels,  and  the 
tr ansi tion  probabi 1 i ties  between  labels. 

The  algorithm  is  a  relaxation  method,  similar  to  the  EM  algorithm. 
The  estimation  step  consists  of  max i mum- 1  ike  1 i hood  estimation  of  the 
distributional  parameters,  for  tentatively  fixed  values  of  the  labels 
and  transition  probabilities.  The  maximization  step  consists  of 
maximizing  the  likelihood  over  the  labels  and  transition  probabilities, 
for  tentatively  fixed  values  of  the  distributional  parameters. 
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.  As  developed  so  far,  the  algorithm  is  a  forward  algorithm, 
classifying  X2  after  xj,  X3  after  X2  and  xj,  etc.  It  is 
suitable  for  sequential  operation  in  real  time,  but  it  is  non-optima! 
in  other  modes  of  operation.  Its  performance  could  possibly  be 
improved  by  a  backcasting  technique  analogous  to  that  in  Box  and 
Jenkins  (1976)  and  by  application  of  the  Viterbi  algorithm  (forney 
1973)  *  which  is  a  recursive  optimal  solution  to  the  problem  of 
estimating  the  state  sequence  of  a  d i screte-t ime  finite  state 
Harkov  process;  it  is  applicable  here  because  this  is  what  we  have 
at  each  stage  when  the  distributional  parameters  and  transition 
probabilities  are  tentatively  fixed  and  the  labels  are  to  be  estimated. 

Further,  the  parameter-estimation  step  of  the  algorithm  can  be 
improved.  The  estimation  implemented  in  the  existing  algorithm  leads 
to  estimates  that  are  biased  (even  asymptotically).  (See,  e.g.,  Bryant 
and  Williamson  1978*)  This  bias  may  be  viewed  as  due  to  the 
truncation  resulting  from  the  algorithm.  The  estimation  could  be 

modified  by  doing  it  in  a  Bayesian  manner,  e.g.,  estimate  the  mean  of 
Class  A  as 

n  n 

>_  x(t)  Pr  (a  |  x  (t) )  />_  Pr  (a  [  x  (t) ) 
t= 1  t=l 

(In  this  expression,  Pr(a|x)  can  be  replaced  by  Pr(xja)  since 
Pr(a)/f(x)  will  cancel  out.)  This  modification  in  the 
parameter-estimation  step’  can  be  important.  For,  in  this  estimate, 
all  the  observations  play  a  role,  whether  labeled  as  "Class  A"  or 

otherwise,  so  that  at  least  some  of  the  bias  incurred  by  using  only 

the  "a"  observations  will  be  removed  by  allowing  all  of  the 

observations  to  enter. 
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The  work  done  to  date  is  explicit  only  for  the  case  in  which  the 
class-conditional  processes  consist  of  independent,  identically 
distributed  random  variables.  The  work  is  to  be  extended  to  other, 
often  more  realistic  cases,  such  as  that  of  autoregression  within 
segments . 

4.  Image  segmentation 

Similar  ideas  are  applied  to  digital  images  in  Sclove 
(1983a; 1984a) .  Here  the  label  process  is  modeled  as  a  Markov  random 
field.  The  same  improvements  made  in  the  time-series  context  will  be 
carried  over  to  the  two-dimensional,  image-processing  context.  For 
example,  computer  experiments  (Sclove  1984b)  with  the  existing 
algorithm  have  shown  it  to  be  successful,  even  in  finding  small 
targets.  However,  at  the  same  time,  these  experiments  have  shown  the 
importance  of  some  such  modification  as  backcasting,  as  mentioned  in 
connection  with  time  series,  to  eliminate  anomalous  border  effects. 

Extension  of  the  existing  work  to  two-dimensional  autoregressions 
within  segments  will  yield  algorithms  that  may  detect  textures. 
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ABSTRACT :  Visibility  is  produced  by  a  variety  of  meteorological  factors 
related  to  micro-,  rneso-,  and  macro-scale  processes.  In  addition  the 
frequency  distribution  of  visibility  is  non-Gaussian.  Thus  a  factor 
analysis  is  not  trivial. 


Today  factor  analysis  is  aided  by  "canned"  programs  on  most  larger 
computer  systems.  However,  most  of  the  time  it  is  not  readily 
understood  what  these  programs  produce.  Thus  an  investigation  was 
performed  to  compare  four  different  approaches  of  a  factor  analysis.  A 
principal  components  analysis,  an  unweighted  least  squares,  a  general 
least  squares  approach  and  a  maximum  likelihood  method  were  examined 
for  a  basic  correlation  matrix  of  eight  atmospheric  parameters  and  for  a 
7-year  record  of  Stuttgart,  Germany.  Furthermore,  unrotated  factors, 
and  orthogonal  and  oblique  rotation  of  factors  were  included.  As 
expected  the  results  of  the  factor  analysis  differ  in  details.  However, 
the  four  methods  show  some  common  principles. 

I.  INTRODUCTION:  Factor  analysis  was  used  in  behavioral  science  when 
Spearman  (1904,  1927).  Cattell  (1952  and  1965),  and  others  established 
the  basic  statistical -mathematical  background.  The  physical  sciences 
followed  hesitantly.  Factor  analysis  in  the  atmospheric  sciences  can  only 
be  found  in  the  last  two  decades,  e.g.  Christensen  and  Bryson  (1966), 
Kutzbach  (1967),  Buell  (1971)  etc. 

In  part  this  was  due  to  the  elaborate  mathematical  procedure  which 
is  required  in  the  mathematical  solution.  Today,  factor  analysis  is  aided 
by  electronic  data  processing.  In  recent  times  even  "canned  programs"  are 
available.  Thus  the  mathematical  difficulties  have  been  resolved.  The 
physicist  will  find  several  methods  of  estimation,  however,  and  may  be 
confused  about  the  answer  to  the  question  which  method  may  be  most 

suitable  and  may  provide  the  best  estimators:  Furthermore,  in  order  to 
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draw  the  correct  conclusions  from  the  solutions  by  those  "canned 
programs/'  it  is  necessary  to  separate  the  "mathematics"  from  the 
"physics.” 

This  study  serves  to  elucidate  some  of  the  mathematical  background 
and  reveal  some  physical  characteristics  by  comparing  the  results  for 
several  methods  of  factor  analysis  applied  to  data  of  a  seven-year  record 
of  atmospheric  parameters  for  Stuttgart,  Germany. 

We  learn  that  the  estimators  for  the  "commune I i ties"  differ  for  the 
individual  methods.  This  is  expected.  The  physical  characteristics  of  the 
factors,  however,  display  great  similarity  after  rotation  of  the  coordinate 
system  although  the  sequence  is  not  always  the  same  for  the  individual 
methods. 

2.  PRINCIPAL  COMPONENTS  ANALYSIS.  The  basic  model  for  factor 
analysis  can  be  formulated  as  follows: 

MX  =  MAMF  +  M£  (1) 

where  is  a  data  matrix  (the  only  known  matrix  in  Eqn  1),  a 

coefficient  matrix  of  factors.  Up  the  factor  matrix,  and  M6  an  error 

matrix.  is  also  called  the  factor  loading  matrix  or  factor  pattern.  In 

the  basic  factor  analysis  neither  the  factors  are  correlated,  nor  are  the 
factors  and  the  errrors. 

The  mathematical  solution  for  Eqn  (1)  can  be  formulated  as: 

Mx  =  Ma  t  Mat  +  m  (2) 

where  t  -  Mp'Mp  is  a  factor  covariance  matrix  and  T  a  diagonal  matrix 
'?  =  MqM6,  with  riD  a  diagonal  errormatrix. 

As  stated  above,  is  a  data  matrix.  In  its  standardized  form 

is  a  correlation  matrix  I1R  with  unity  in  its  diagonal.  This  is  called  a 

"closed"  system  or  principal  components  analysis.  Then  the  errror 
matrix  '9  has  zero  elements  outside  the  diagonal. 

The  true  factor  analysis  is  based  on  the  postulation  that  not  all 
factors  are  known.  In  order  to  account  for  this  fact  the  diagonal  in  the 
correlation  matrix  Mr  must  be  reduced  i.e.  the  diagonal  elements  are 

40 


less  than  1.0.  These  diagonal  elements  are  also  colled  "communal i ties". 
Determining  f  and  fiA  requires  a  solution  for 

mR  =  '1a  i  mAT  (3) 

which. is  a  known  problem  in  mathematics.  The  model  can  be 
reformulated: 

Di  =  'IaTir'-'a  '  (D 

with  I1at  =  Ma-'  and  a  diagonal  matrix.  Dj,  is  called  the  matrix  of 
eigenvalues  and  liA  contains  the  eigenvectors.  In  the  principal 

components  analysis  I1ATMA  =  I.  For  more  details  see  Essenwanger 
(1976). 

3-  THE  C  0  M  h  U  N  A  L  IT  I E  S ,  Four  different  methods  have  been  studied  in 
this  investigation.  In  the  first  method  a  principal  components  analysis 
(F'X.)  is  performed  and  a  specific  number  of  factors  is  accepted.  E.g.  for 
a  correlation  matrix  with  8X8  dimension  8  principal  component  factors 
are  obtained  from  the  mathematical  model.  We  rnay  decide  to  select  the 
largest  4  factors.  This  is  equivalent  to  a  truncation.  The  commonalities 
are  then  recalculated  from  these  4  accepted  factors.  This  procedure  may 
apppear  to  be  somewhat  arbitrary  and  subjective.  It  must  be  pointed 
out,  however,  that  the  number  of  physical  factors  is  unknown.  Although 
the  total  number  of  factors  in  the  principal  components  analysis  is 
determined  by  the  dimension  of  the  matrix  !1R  the  uncertainty  of  factors 

with  significance  in  physics  is  contained  in  the  chosen  number  of 
elements  in  the  11R  matrix.  A  formalistic  mathematical  solution  can  be 

achieved  for  any  dimension  of  the  correlation  matrix  MR.  However, 

whether  all  possible  factors  in  the  principal  components  analysis. have 
significant  meaning  in  physics  is  not  determined  by  the  mathematical 
solution. 

The  number  of  factors  is  also  a  subjective  choice  in  the  other  three 
methods.  Thus  the  truncation  of  factors  in  the  principal  components 
analysis  is  not  worse  than  the  assumption  of  the  number  of  factors  in 
the  other  three  methods. 

The  other  three  methods  differ  how  estimators  are  calculated  for 
the  communal i ties.  We  assume  the  number  of  factors  which  are 
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accepted  and  obtain  estimators  os  follows. 

The  unweighted  least  squares  method  (ULSQ)  requires  that  U  is  a 
minimum  for 

U  =  (l/2)tr(Mg-Hx)2  (5) 

where  Ms  is  the  correlation  matrix  with  estimators  in  the  diagonal  and 
tr  means  the  trace. 

In  the  generalized  least  squares  method  (GLSQ)  G  is  a  minimum  for 
G  =  (1/2)  tr  (In  -  ns“ 1  Mx)2  (6) 


where  In  denotes  a  diagonal  matrix  of  unity  and  11-  and  Mv  are  the  same 
as  under  Eqn  (5). 

Finally,  the  maximum  likelihood  principle  (tIXLI)  is  applied  to 
minimize: 

M  =  tr  [(Mx" 1  Ms)  -  fin  ( Mx" 1  Mg)  ]  -  n  (7) 

(See  Joreskog,  1967)  where  n  is  the  number  of  variables. 

Other  methods  to  substitute  estimators  for  the  diagonal  in  lip  exist 

(see  Essenwanger,  1976)  but  were  not  included  in  the  present  study;  see 
also  Guttrnan  (1956). 

4.  ROTATIONS.  Although  the  solution  of  11A  provides  characteristic 

factors  which  may  have  meaningful  interpretation  in  physics,  it  is 
customary  to  enhance  certain  features.  This  is  accomplished  by  rotation 
of  the  coordinate  system.  This  is  called  attaining  simple  structure.  The 
ultimate  goal  is  the  following: 

(a)  At  least  one  zero  in  each  row 

(b)  k  zeros  in  each  column  (k-1  for  principal  components) 

(c)  For  any  pair  of  factors: 

1.  High  loading  in  one  element  1.0 
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2.  Zero  in  other  variables 


3.  Small  loading  on  both  factors  for  the  variable 

4.  Only  a  few  non-vanishing  loading  on  both. 

In  order  to  explain  the  rotation  procedure  let  us  recall  that: 

MA  =  mEdX,  (8) 

where  hE  is  an  eigenvector  matrix  and  is  a  diagonal  matrix  of 

eigenvalues  X,  with  Xj  =/x.  Two  methods  of  rotationsare customary: 

orthogonal  and  oblique  rotation.  In  terms  of  mathematics  the  orthogonal 
rotation  is  achieved  by 

f1FO  =  MAT 1  (9) 

where  T  j  is  a  transformation  matrix.  Oblique  rotation  requires  two 

transformation  procedures  because  factor  pattern  and  factor  structure 
matrix  are  not  identical  as  in  the  orthogonal  transformation. 

Thus: 

Mpp  =  MaT2_1  (factor  pattern  matrix)  (10a) 

^F5  =  f'1AT2  (factor  structure  matrix)  (10b) 

While  the  factors  are  uncorrelated  in  the  solution  of  Eqrr’s  4-7  and  the 
orthogonal  rotation,  the.  oblique  rotation  introduces  factors  which  are 
correlated.  Thus  Mpp  represents  the  regression  coefficients  in  the 

structure  pattern,  and  Mpg  the  covariances  between  variables  arid 
factors.  The  factor  pattern  is: 

xisat1f1*®12f2  *■■•*(*))  01) 

where  Mpp  determines  the  3jj  and  Mpg  the  fj  terms;  is  the  error. 

5.  EIGENVALUES,  FACTOR  LOADS  AND  COMMONALITIES.  The  introduced 
four  methods  of  estimating  the  communalities  have  been  applied  to 


atmospheric  data  of  Stuttgart  (Fed.  Rep.  Germany).  The  data  cover  the 
period  *  Sept  1946-August  1953.  Eight  meteorological  elements  have 
been  selected:  ceiling  (CEIL),  visibility  (VIS),  wind  direction  (WD), 
windspeed  (WS),  temperature  (TEMP),  dewpoint  (DEV/P),  relative 
humidity  (REHIJ)  and  pressure  (PRES).  Visibility  was  utilized  in  linear 
scale  and  as  transformed  variate  in  logarithmic  scale.  The  wind  velocity 
was  also  converted  to  zonal  (U)  and  meridional  (V)  components.  These 
differences  in  the  element  selections  will  be  discussed  later. 

Data  as  exhibited  in  Tables  1  and  2  were  chosen  as  a  typical 
example  for  disclosing  the  diversity  caused  by  different  methods  of 
estimating  the  communal i ties.  Table  1  displays  the  eigenvalues  for  data 
from  Stuttgart  (linear  visibility,  zonal  arid  meridional  wind  components). 
We  learn  from  perusal  of  Table  1  that  the  individual  eigenvalues 
fluctuate  and  depend  on  the  chosen  method.  The  dissimilarity  is  even 
found  in  the  sums  of  these  eigenvalues.  However,  rotation  of  the 
coordinate  systems  (orthogonal  and  oblique)  has  no  effect  on  the  sum,  as 
expected.  The  numerical  values  differ  only  by  rounding. 

The  differences  between  the  individual  methods  for  the  sum  of 
eigenvalues  can  be  traced  to  the  sum  of  cornmunali ties  (Table  2).  As 
confirmed  by  the  observed  data  the  sum  of  eigenvalues  must  be  identical 
with  the  sum  of  the  cornmunalities  save  rounding.  In  the  principal 
components  analysis  this  sum  is  identical  with  the  number  of  elements 
if  the  number  of  factors  is  not  truncated. 


We  also  notice  in  Table  1  that  the  truncated  principal  components 
analysis  shows  the  highest  approximation  (Q2%)  of  the  total  variance  for 
the  chosen  number  of  factors,  in  our  case  four. 


^Footnote:  We  experienced  difficulty  with  the  magnetic  tape  record 
after  7  years  of  data.  The  difficulty  could  not  be  resolved  for  inclusion 
into  this  manuscript.  Only  Table  3  was  available  for  10  years. 
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While  Table  1  exhibits  fluctuations  of  the  sum  of  eigenvalues  from 
6.56?  to  5.040  these  variations  are  not  necessarily  repeated  for  other 
data  sets.  E.g.  Table  3  has  been  compiled  for  10  years  of  data  for 
Stuttgart  in  January,  substituting  visibility  in  its  transformed 
logarithmic  scale,  and  zonal  and  meridional  components  of  wind  have 
been  replaced  by  speed  and  direction  (see  Essenwanger,  1964).  We  learn 
that  the  sum  of  the  eigenvalues  for  the  three  methods  ULSQ,  GLSQ,  and 
MXLI  differ  very  little,  although  the  individual  eigenvectors  show 
dispersion.  Again,  the  truncated  principal  components  analysis  renders 
the  highest  approximation  of  the  variance  (about  81%). 

6.  FACTOR  LOADS,  STRUCTURE  MATRIX  AND  FACTOR  PATTERN.  Tables 
4A-D  provide  detailed  information  about  the  factors.  Four  sections  are 
shown  in  each  Table  4A-D.  The  first  section  provides  the  unrotated 
factor  loads  for  the  solution  with  eornmunalities.  E.g.  in  the  case  of  the 
principal  components  method  (Table  4A)  these  are  the  first  4 
eigenvectors  of  a  correlation  matrix  with  unity  in  the  diagonal  matrix. 
The  numerical  values  in  these  four  columns  represent  the  affinity  with 
the  elements  arid  can  be  interpreted  as  a  (linear)  correlation  coefficient. 

The  first  factor  (Table  4 A)  which  represents  39%  of  the  variance 
(i.e.  3.14/8.00)  discloses  high  association  with  temperature,  dewpoint, 
zonal  (U)  and  meridional  (V)  wind  component  3nd  visibility,  in  that  order 
of  magnitude.  The  second  factor  with  about  21%  of  the  variance  is  again 
a  mixture,  relative  humidity,  visibility,  dewpoint  arid  ceiling.  In  the 
third  factor  the  pressure  stands  out  while  the  fourth  factor  is  again  a 
mixture  whereby  all  elements  are  contributing  except  the  relative 
humidity  (-.07  means  almost  zero). 

The  unrotated  factor  load  is  a  valid  solution.  It  was  pointed  out 
previously  that  a  rotation  of  the  coordinates  will  enhance  the 
association  between  individual  factor  arid  element.  This  simplification 
process  was  described  in  section  four.  The  sum  of  the  eigenvalues 
remains  constant  in  this  transformation. 

Inspection  of  the  section  for  orthogonal  rotation  in  Table  4A  reveals 
that  now  the  first  factor  principally  is  related  with  the  temperature 
elements,  i.e.  temperature  and  dewpoint.  The  second  factor  comprises 
the  moisture  elements  (relative  humidity,  visibility  arid  ceiling).  The 
third  factor  contains  the  pressure,  and  the  fourth  factor  the  wind.  This 
may  be  expected  by  some  readers  and  may  be  a  trivial  answer.  It  should 
be  stressed,  however,  that  the  mathematical  formalism  could  have  led  to 
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a  different  answer  and  combination  of  elements.  The  separation  into 
these  four  factors  is  logical  on  account  of  the  physics  background. 

This  may  give  the  impression  that  the  grouping  into  these  4  factors  is 
trivial.  In  turn,  the  mathematical  formalism  has  led  in  this  case  to  an 
answer  which  has  an  interpretation  in  terms  of  physics.  However, 
beyond  the  expected  factors  we  gain  information  about  the  weights  of 
the  factors.  This  weight  is  not  readily  available  by  expectation  alone. 

The  lower  part  of  Table  4A  lists  the  result  for  an  oblique  rotation. 
While  the  structure  matrix  contains  the  covariances  (which  are 
equivalent  to  the  correlation  coefficient);  the  factor  pattern  expresses 
the  regression  coefficients.  In  the  oblique  rotation  the  factors  are 
intercorrelated  (see  Table  5).  They  are  not  correlated  with  each  other 
for  the  unrotated  or  the  orthogonal  solution.  We  learn  from  the 
structure  matrix  of  Table  4A  that  the  factors  have  not  essentially 
changed  from  the  orthogonal  rotation  case.  Therefore,  the 
intercorrelation  (between  factors)  is  very  low  (Table  5). 

The  results  for  the  other  methods  (UL5Q,  GLSQ,  MXLI)  are  sirni liar 
with  minor  changes  .except  that  the  weights  are  different  for  the 
individual  factors.  In  Table  4B  we  notice  that  the  ceiling  shows  only 
very  low  influence  in  any  of  the  factors.  This  result  is  repeated  in 
Table  4C.  While  in  the  previous  methods  the  pressure  is  one  factor,  it 
shows  virtually  no  contribution  in  the  GLSQ  method.  It  reappears  as  a 
factor  in  Table  4D,  MXLI  method.  Another  difference  between  Tables  4 A, 
B  and  Tables  4C,  0  is  the  influence  of  the  windspeed.  In  Table  4A  the 
factor  with  the  two  wind  components  indicates  equal  correlation  of  the 
wind  components.  In  Table  4B  a  small  preference  of  the  meridional 
component  is  already  visible.  In  Tables  40,  D,  however,  the  meridional 
wind  component  appears  to  be  more  dominant  than  the  zonal  influence  in 
the  wind  factor. 

One  further  peculiarity  must  be  mentioned.  In  the  unrotated  and 
orthogonally  rotated  case  the  sum  of  the  eigenvalues  SU^  AND  SO^, 

respectively,  is  equal  to  the  sum  of  the  squares  of  the  factor 
components. 

5UX  =  2  fu2  (12a) 

i 

or  SOx  =  |f02  (12b) 

where  fu2  and  f  2  denote  the  numerical  value  in  the  respective 
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factor  column  and  n  designates  the  number  of  elements.  In  the  oblique 
case  we  find 

50BX  =  2  fs  fF  (13) 

i 

where  fs  is  the  column  value  In  the  structure  matrix  and  fF  the 

corresponding  column  value  in  the  factor  pattern.  Although  the  sum  of 
S0BX  for  the  4  factors  renders  the  same  numerical  value  as  the 

unrotated  or  orthogonally  rotated  case  the  individual  items  SOB^  can  be 

positive  or  negative  in  the  maximum  likelihood  method  (Table  6).  The 
exhibited  case  in  Table  6  is  not  an  isolated  case  or  error  as  the  first 
impression  may  be.  As  can  be  seen  from  Table  7  A  a  negative  term 
appears  also  in  a  combination  of  elements  Ln  VIS,  WD,  WS.  In  July 
(Table  7B)  this  peculiarity  did  not  show,  and  it  almost  rules  out  that  it 
is  an  error  in  the  computer  program.  Thus  the  maximum  likelihood 
method,  at  least  in  our  "canned  computer  program",  appears  to  be  very 
sensitive  to  changes  of  the  correlations  in  the  input  matrix. 

7.  FACTOR  ANALYSIS.  The  detailed  information  on  urirotated  and 
f  stated  factors  is  listed  in  Tables  4A-D  for  one  version  of  a  set  of 
elements.  These  detailed  tabulations  are  somewhat  difficult  to  read.  In 
order  to  enhance  the  significant  features  of  the  factors,  two  changes 
were  introduced  for  Tables  7A  and  B.  First,  all  correlations  r  $  0.4 
were  omitted  except  the  maximum  correlation  in  one  line  which  could  be 
smaller  than  0.4.  Secondly  the  sign  was  omitted  because  the  sign  plays 
only  a  role  in  formulating  eqn  (1 1)  and  performing  calculations v/ith  it. 
The  magnitude  is  sufficient  for  evaluation  of  the  factors. 

In  Table  7A,  B  eight  atmospheric  elements  are  shown.  For  these 
eight  elements  visibility  was  used  in  its  linear  scale  and  with  a 
transformed  (logarithmic)  scale,  in  the  top  part  of  Tables  7A,  B  the 
wind  appears  as  speed  and  direction  while  in  the  center  and  lower 
section  the  zonal  and  meridional  components  have  been  utilized.  These 
modifications  lead  to  three  different  versions  of  factor  analysis  for  the 
same  elements.  Only  the  solutions  with  orthogonal  and  oblique  rotation 
are  included  in  Tables  7 A,  B. 

Table  7A  exhibits  the  condition  for  January.  The  significant 
features  do  not  vary  essentially  between  the  three  versions.  The  only 
exception  is  the  contribution  by  ceiling  of  clouds  which  renders  a 
significant  factor  for  the  UlSQ  method  (top  and  center)  but  is  not  a 
special  factor  at  the  bottom  section  where  it  is  replaced  by  the 
pressure.  The  differences  between  individual  methods  (PC,  ULSQ,  GLSO, 


and  MXLI)  were  mostly  described  in  the  previous  section  6  arid  will  not  be 
repeated  here. 

Table  76  provides  the  factor  analysis  for  July  at  Stuttgart  for  the 
same  seven-year  period  of  record  at  Stuttgart.  Again,  it  can  be  noticed 
that  the  oblique  rotation  is  riot  significantly  different  from  the  factors 
provided  by  orthogonal  rotation.  Other  data,  not  included  here,  follow  the 
same  trend  that  orthogonal  and  oblique  rotation  do  not  differ  significantly. 
This  fact  may  imply  that  orthogonal  rotation  may  be  sufficient  for  factor 
analysis  of  atmospheric  elements.  Although  the  characteristic  of  factors 
shows  a  similar  pattern  in  July  as  given  for  January,  some  difference 
exist.  Besides  the  mentioned  difference  in  the  contribution  by  the  ceiling 
a  major  change  has  occurred  in  the  association  of  elements.  Relative 
humidity  and  visibility  are  now  associated  with  temperature  in  three  of 
the  four  methods  for  all  three  versions.  This  first  factor  proves  to  be  the 
dominant  influence  but  not  by  much. 

The  primary  purpose  of  this  study  was  not  the  illustration  of  the 
changes  throughout  the  year  but  the  exhibition  of  the  differences  in  the 
utilization  of  the  individual  methods.  Although  variations  exist,  a  close 
perusal  reveals  that  physical  characteristics  of  the  system  do  not  differ 
too  much  in  the  individual  methods. 

8.  CONCLUSION  AMD  SUMMARY.  The  present  study  illustrates  that  the 
estimation  approach  for  the  commune  lilies  by  different  methods  (eqn  5-7) 
leads  to  different  factors.  They  are  more  uniform,  however,  after  rotation 
of  the  factors.  This  confirms  that  the  basic  problem  in  factor  analysis  has 
not  been  resolved  as  of  today,  namely  the  derivation  of  suitable 
estimators  for  the  cornmunalities  (see  Catted,  1965  or  Guttrnan,  1956). 

As  the  study  proves,  however,  the  physical  features  after  rotation  of  the 
factors  show  major  agreement,  although  differences  in  details  and  in  the 
sequence  of  importance  of  factors  can  be  found. 

This  study  revealed  that  for  atmospheric  elements  the  factors 
derived  by  oblique  rotation  do  not  differ  significantly  from  factors 
procured  by  orthogonal  rotation.  This  may  imply  that  the  elaborate 
mathematical  procedure  for  oblique  rotation  could  be  saved  in  favor  of  the 
simpler  and  iess  costly  orthogonal  rotation. 

The  factors  appearing  in  the  January  data  are  related  to  four  simple 
combinations,  temperature,  wind,  moisture  and  pressure.  This  simple 
division  is  not  repeated  in  the  July  data.  However,  the  resulting  factors 
from  the  analysis  procedure  do  not  give  unreasonable  combinations  in 
terms  of  physics.  E.g.  the  combination  of  temperature  with  visibility  and 
relative  humidity  may  have  some  explanation  in  terms  of  relationship 
between  reduced  radiation  during  high  relative  humidity  arid  low  visibility 
and  vice  versa.  Also  the  combination  of  a  wind  component  with 
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temperature  terms  may  indicate  a  reflection  of  the  circulation  of  air 
either  in  the  macro-  or  meso-scale.  Other  detailed  features  in  the 
patterns  of  factors  may  be  reserved  for  a  further  study. 

Finally,  no  specific  recommendation  as  to  the  "best  suitable  method" 
for  estimating  the  comrnunalities  can  be  made  at  the  present  time. 
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TABLE  1.  COMPARISON  OF  EIGENVALUES,  FACTOR  LOADS 
(STUTTGART,  JANUARY) 

(1)  Unrotated  Factor  Loads 


PC 

ULSQ 

GLSQ 

MXLI 

*  1 

3.136 

2.929 

2.811 

2.303 

x2 

1.695 

1.385 

1.590 

1.462 

X  3 

1.016 

0.924 

0.636 

1.328 

x4 

0.720 

0.432 

0.003 

.789 

£  A 

6.567 

5.670 

5.040 

5.882 

(2) 

Orthogonal 

Factor  Load 

*1 

2.150 

2.157 

2.252 

2.196 

X2 

1.611 

1.152 

1.257 

1.076 

X  3 

1.200 

1.080 

1.528 

1.272 

1.601 

1.273 

0.003 

1.337 

£A 

6.562 

5.662 

5.040 

5.881 

0) 

Oblique  Structure  Matrix 

A  1 

2.128 

2.102 

2.192 

1.189 

X2 

1.613 

1.170 

1.262 

2.238 

A  3 

1.203 

1.081 

1.576 

1.460 

X  4 

1.622 

1.311 

0.011 

.994 

£  X 

6.566 

5.664 

5.041 

5.881 
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TABLE  2.  COMMUNALITIES 


(STUTTGART,  JANUARY) 


PC 

ULSQ 

GLSQ 

MXLI 

CEIL 

.697 

.234 

.159 

.200 

VISIB 

.758 

.504 

.399 

.428 

U 

.729 

.507 

.424 

.477 

V 

.811 

.714 

1.000 

.781 

TEMP 

.947 

1.002 

1.000 

.996 

DEWP 

.988 

1.007 

1.000 

1.000 

REHU 

.749 

.693 

1.000 

.999 

PRES 

.887 

1.002 

.058 

1.000 

zx2 

.  6.566 

5.663 

5.040 

5.881 

TABLE  3. 

EIGENVALUES  AND 

COMMUNALITIES 

STUTTGART,  JANUARY,  1946-1956 

,  Ln  Vis,  WDD,  WSP 

(A)  EIGENVALUES  (ORTHO. 

FACT. 

LOAD) 

■  PC 

ULSQ 

GLSQ 

MXLI 

x  1 

2.207 

1.863 

2.042 

1.868 

x2 

2.053 

1.532 

1.310 

1.525 

X  3 

1.254 

1.185 

1.230 

1.188 

*4 

1.004 

1.062 

1.018 

1.063 

6.518 

5.642 

5.600 

5.642 

(B)  COMMUNALITIES 

PC 

ULSQ 

GLSQ 

MXLI 

.802 

1.000 

.146 

1.000 

.740 

.532 

.441 

.531 

.630 

.498 

1.000 

.501 

.712 

.592 

1.000 

.591 

.941 

.990 

1.000 

.990 

.996 

1.000 

.995 

1.000 

.705 

1.000 

1.000 

1.000 

.991 

.031 

.018 

.031 

lxZ 

6.517 

5.643 

5.600 

5.644 
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TABLE  4A.  FACTOR  LOADS,  STRUCTURE  MATRIX  AND  FACTOR  PATTERN 

(STUTTGART,  JANUARY) 

PRINCIPAL  COMPONENTS 


UNROTATED 

ORTHOG. 

ROT. 

PC 

ULSQ 

GLSQ 

MXLI 

PC 

ULSQ 

GLSQ 

MXLI 

CEIL 

.44 

-.46 

.48 

.24 

.19 

-.49 

.54 

.37 

VIS 

-.59 

-.58 

.05 

.26 

-.35 

-.73 

-.18 

-.26 

U 

-.76 

-.04 

.10 

-.38 

-.40 

-.05 

-.05 

-.75 

V 

.67 

.38 

.03 

.47 

.12 

.29 

.10 

.84 

TEMP 

-.87 

.18 

.27 

.28 

-.92 

-.12 

-.11 

-.26 

DEWP 

-.80 

.47 

.26 

.23 

-.95 

.18 

-.09 

-.20 

REHU 

.10 

.86 

.00 

-.07 

-.21 

.82 

.02 

.16 

PRES 

.39 

.09 

.80 

-.30 

.08 

.16 

.92 

.03 

£x2 

3.14 

1.70 

1.02 

.72 

2.15 

1.61 

1.20 

1.60 

OBLIQUE  ROTATION 

STRUCTURE  MATRIX 

FACTOR 

PATTERN 

CEIL 

.28 

-.44 

.58 

.39 

.10 

-.50 

.52 

.36 

VIS 

-.37 

-.75 

-.21 

-.38 

-.32 

-.73 

-.13 

-.16 

U 

-.49 

-.11 

-.13 

-.81 

-.29 

-.00 

.01 

-.73 

V 

.22 

.36 

.17 

.87 

-.02 

.23 

.05 

.83 

TEMP 

-.95 

-.14 

-.18 

-.42 

-.91 

-.14 

-.04 

-.14 

DEWP 

-.98 

.16 

-.17 

-.34 

-.95 

-.16 

-.04 

-.09 

REHU 

-.21 

.83 

.01 

.19 

-.24 

.81 

.01 

.13 

PRES 

.13 

.19 

.92 

.11 

.03 

.19 

.93 

-.06 

Structure  Matrix  =  Covariance 

Factor  Pattern  =  Regression  Coefficients 
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TABLE  4B.  FACTOR  LOADS,  STRUCTURE  MATRIX  AND  FACTOR  PATTERN 

STUTTGART,  JANUARY 
UNWEIGHTED  LEAST  SQUARE 


PC 

UNROTATED 
ULSQ  GLSQ 

MXLI 

.. 

PC 

ORTHOG. 

ULSQ 

ROT. 

GLSQ 

MXLI 

CEIL 

.35 

-.23 

.21 

-.12 

.29 

-.25 

.23 

-.19 

VIS 

.51 

-.47 

.12 

-.11 

-.25 

-.54 

-.14 

.36 

U 

-.66 

-.73 

.07 

.26 

-.39 

-.05 

-.10 

.58 

V 

.61 

.40 

-.09 

-.42 

.15 

.23 

.10 

-.79 

TEMP 

-.92 

.19 

.19 

-.28 

-.95 

-.17 

-.07 

.25 

DEWP 

-.85 

.51 

-.09 

-.11 

-.95 

.21 

-.07 

,22 

REHU 

-.07 

.76 

-.21 

.25 

-.15 

.81 

.00 

—  .12 

PRES 

.40 

.26 

.87 

.^1 

.07 

-.01 

.99 

-.11 

2 

Ex 

2.93 

1.38 

0.92 

0.43 

2.16 

1.15 

1.08 

1.27 

STRUCTURE  MATRIX 

FACTOR 

PATTERN 

CEIL 

.32 

-.23 

.26 

-.23 

.23 

-.27 

.22 

-.18 

VIS 

-.29 

-.58 

-.18 

.46 

-.22 

-.53 

-.10 

.26 

U 

-.46 

-.11 

-.16 

.66 

-.28 

.00 

-.04 

.56 

V 

.24 

.33 

.16 

-.83 

-.01 

.16 

.04 

-.79 

TEMP 

-.97 

-.16 

-.15 

.46 

-.94 

-.18 

-.03 

.11 

DEWP 

-.98 

.22 

-.14 

.40 

-.92 

.20 

-.03 

,14 

REHU 

-.15 

.82 

.02 

-.16 

-.12 

.81 

.00 

-.03 

PRES 

.12 

.11 

.99 

.21 

.01 

.06 

.99 

-.04 

Structure  Matrix  =  Covariance 

Factor  Pattern  =  Regression  Coefficients 
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TABLE  4C.  FACTOR  LOADS,  STRUCTURE  MATRIX  AND  FACTOR  PATTERN 

STUTTGART,  JANUARY 


GENERAL _  LEAST _ SQUARES 


UNROTATED 

ORTHO.  ROT 

PC 

ULSQ 

GLSQ 

MXLI 

PC 

ULSQ 

GLSQ 

MXLI 

CEIL 

-.36 

.14 

.10 

.03 

-.33 

.16 

.15 

.02 

VIS 

.39 

.49 

.08 

.01 

.24 

.44 

-.38 

.01 

U 

.60 

.19 

-.17 

-.003 

.42 

.06 

-.49 

-.003 

V 

-.54 

-.61 

.57 

.000 

-.11 

-.15 

.98 

.01 

TEMP 

.96 

.03 

.26 

-.03 

.94 

.21 

-.25 

-.02 

DEWP 

.95 

-.29 

.10 

.03 

.97 

-.14 

-.19 

.04 

REHU 

.09 

-.90 

-.42 

-.01 

.21 

-.97 

.15 

.005 

PRES 

-.20 

-.12 

.06 

-.01 

-.13 

-.06 

' 

.20 

-.004 

2 

Ex 

2.81  . 

1.59 

0.64 

0.003 

2.25 

1.26 

1.53 

.003 

STRUCTURE  MATRIX 

FACTOR  PATTERN 

CEIL 

-.34 

.17 

.18 

-.03 

-.30 

.18 

.14 

.03 

VIS 

.26 

.46 

-.44 

-.19 

.22 

,43 

-.33 

.007 

U 

.46 

.08 

-.55 

-.10 

.36 

.03 

-.47 

-.008 

V 

-.18 

-.22 

.99 

.31 

.02 

-.07 

,98 

.01 

TEMP 

.96 

.18 

-.39 

-.02 

.94 

.20 

-.17 

-.03 

DEWP 

.98 

-.18 

-.31 

.15 

.94 

-.13 

-.15 

.04 

REHU 

.21 

-.99 

.19 

.35 

.16 

-.96 

.08 

.004 

PRES 

-.14 

-.07 

.22 

.05 

-.10 

-.05 

.19 

-.003 

Structure  Matrix  =  Covariances 
Factor  Pattern  =  Regression  Coefficient 
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TABLE  4D.  FACTOR  LOADS,  STRUCTURE  MATRIX,  FACTOR  PATTERN 

STUTTGART,  JANUARY 

MAXIMUM  LIKELIHOOD 


PC 

UNROTATED 

ORTHO.  ROT 

ULSQ 

GLSQ 

MXLI 

PC 

ULSQ 

GLSQ 

MXLI 

CEIL 

.42 

-.09 

.11 

-.08 

.31 

.23 

.17 

-.14 

VIS 

-.31 

-.00 

.52 

.24 

-.24 

-.15 

.43 

.40 

U 

-.46 

.23 

.20 

.42 

-.41 

-.10 

.04 

.55 

V 

.32 

-.05 

-.35 

-.74 

.14 

.09 

-.17 

-.85 

TEMP 

-.73 

.58 

.35 

-.00 

-.95 

-.06 

.17 

.25 

DEWP 

-.76 

.65 

.00 

.00 

-.96 

-.06 

-.18 

.19 

REHU 

-.16 

.29 

-.94 

.00 

-.16 

-.01 

-.98 

-.14 

PRES 

.76 

.65 

.00 

.00 

.08 

.99 

-.07 

-.12 

2 

Ex 

2.30 

1.33 

1.46 

.79 

2.20 

1.08 

1.27 

1.34 

STRUCTURE  MATRIX 

FACTOR  PATTERN 

CEIL 

.25 

-.36 

-.11 

-.10 

.22 

-.43 

.16 

-.08 

VIS 

-.24  . 

.23 

.56 

.39 

-.22 

-.14 

.55 

.25 

U 

-.18 

.49 

.43 

.53 

-.14 

.35 

.11 

.44 

V 

.21 

-.27 

-.43 

-.83 

.19 

-.02 

-.20 

-.78 

TEMP 

-.15 

.93 

.81 

.23 

-.08 

.68 

.42 

.001 

DEWP 

-.11 

1.00 

.56 

.16 

-.04 

.99 

.00 

.00 

REHU 

.08 

.31 

-.61 

-.18 

.10 

.96 

-.01 

-.01 

PRES 

1.0 

-.11 

-.06 

-.02 

.99 

-.04 

-.00 

.00 

Structure  Matrix  =  Covariances 

Factor  Pattern  =  Regression  Coefficients 
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TABLE  5.  INTERCORRELATION  BETWEEN  FACTORS 
(OBLIQUE  ROTATION) 


A)  Principal  Components  Analysis 


1.0 

-.02 

.13 

.28 


1.0 

-.05 

.10 

.-.33 


1.0 

-.07 

-.20 

.12 


1.0 

-.08 

-.04 

-.01 


-•02  .13  .28 

1*0  .02  .15 

•02  1.0  .15 

•15  .15  1.0 

B)  Unweighted  Least  Squares 

-.05  .10  -.33 

1.0  .04  -.21 

•04  1.0  -.16 

-•21  -.16  1.0 

C)  General  Least  Squares 

-.07  -.20  .12 

1.0  -.15  -.32 

-.15  1.0  .27 

-•32  .27  1.0 

D)  Maximum  Likelihood 

-.08  -.04  -.01 

1.0  .56  .16 

•56  1.0  .28 

.16  .28  1.0 


TABLE  6.  VARIANCE  COMPONENTS  FOR  THE  MAXIMUM 
METHOD  (JANUARY,  STUTTGART,  LN  VIS,  U, 


LIKELIHOOD 

V) 


UNROT 


ORTH.  ROT  OBLIQUE  ROT. 


X1  2.116  1 

*2  1.374  1 

*3  1.448  2 

x4  0.830  1 

n  5.768  5 


.102 

1.469 

.255 

3.934 

.003 

12.392 

.407 

-12.028 

.767 

5.767 
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Small  second-order  composite  designs  were  suggested  by  Hartley  (1959).  Westlake  (1965) 
provided  even  smaller  designs  for  k  =  5,  7,  and  9  factors,  for  which  intricate  construction 
methods  were  needed.  Here,  simple  designs  formed  using  Plackett  and  Burman  (1946)  designs 
are  offered  for  k  -  5,  7,  and  9.  Designs  with  one  run  fewer  than  Westlake's  for  k  -  5  and  7  and 
three  fewer  for  k  =  9  are  feasible  by  deleting  repeat  points  that  occur  in  some  of  the  designs. 


KEY  WORDS:  Center  points;  Composite  designs;  Factorial  designs;  Plackett  and  Burman 
designs;  Response  surfaces. 


1.  INTRODUCTION 

Suppose  we  are  going  to  examine  k  predictor  vari¬ 
ables,  coded  to  Xj,  x2,  . ..,  xk ,  to  determine  their 
effects  on  a  response  variable  y  subject  to  random 
error.  We  might  first  wish  to  perform  a  first-order 
design  to  fit  the  model  y  =  f}0  +  fivxx  4-  ■  •  •  +  f}kxk 
4-  e.  If  no  progress  appeared  possible  (for  example, 
via  steepest  ascent),  we  might  then  wish  to  add  a  few 
runs  to  enable  the  more  comprehensive  second-order 
model, 

y  =  Po  +  'LPtxi  +  'L'LPi)xixJ  +  ey  (h 

tzj 

to  be  examined,  where  all  summations  are  taken  over 
i,j  =  1,  2, ....  k,  Many  possible  second-order  sequen¬ 
tial  designs  may  be  used  to  obtain  the  data  for  such  a 
fitting.  The  specific  choice  of  design  would  depend  on 
the  relative  importance  to  the  experimenter  of  various 
design  features  (for  example,  see  Box  and  Draper 
1975,  p.  347).  One  extremely  useful  type  of  sequential 
second-order  design  is  the  composite  design.  As  initial¬ 
ly  suggested  by  Box  and  Wilson  (1951)  and  followed 
up  by  Box  and  Hunter  (1957),  it  consists  of  a  2* 
factorial  or  a  2k~q  fractional  factorial  portion,  with 
runs  selected  from  the  2*  runs  (x , ,  x2 , . . . ,  xk)  =  ( ±  1 , 
±1, ....  ±  1),  of  resolution  V  or  higher  (for  example, 
see  Box  and  Hunter  1961  or  Box,  Hunter,  and  Hunter 
1978),  plus  a  set  of  2k  axial  points  at  distances  a  from 
the  origin,  plus  n0  center  points.  In  general,  the  2k~q 
portion  or  cube  may  be  repeated  c  times,  and  the  axial 
points  or  star  may  be  repeated  s  times.  The  values  of  a, 
«0 ,  c,  and  s  are  to  be  selected. 

Suppose,  of  the  various  design  criteria,  we  decide  to 
emphasize  having  only  a  small  number  of  runs.  Such  a 
course  of  action  might  be  appropriate  if  runs  were 
expensive,  difficult,  or  time-consuming,  or  if  a  compli¬ 
cated  computer  model  were  to  be  approximated  lo¬ 
cally  by  a  second-order  surface.  Of  course  there  must 


be  at  least  y(/c  4-  l)(k  +  2)  points  in  the  design,  this 
being  the  number  of  coefficients  to  estimate  in  (I). 
Hartley  (1959)  pointed  out  that  ihe  cube  portion  of 
the  composite  design  need  not  be  of  resolution  V.  It 
could,  in  fact,  be  of  resolution  as  low  as  111,  provided 
that  two-factor  interactions  were  not  aliased  with 
two-factor  interactions.  (Two-factor  interactions 
could  be  aliased  with  main  effects,  because  the  star 
portion  provides  additional  information  on  the  main 
effects.)  This  idea  permitted  much  smaller  cubes  to  be 
used.  Westlake  (1965)  took  this  idea  further  by  finding 
even  smaller  cubes  for  the  k  —  5,  7,  and  9  cases.  Table 
1  shows  the  numbers  of  points  in  the  various  designs 
suggested,  for  2  <  k  <  9. 

Westlake  (1965)  provided  (in  an  appendix)  three 
examples  of  22-run  designs  for  k  —  5,  one  example  of  a 
40-run  design  for  k  -  7,  and  one  example  of  a  62-run 
design  for  k  =  9.  He  noted  that  for  k  =  7  or  9,  “sys¬ 
tematic  generation  of  all  possible  designs  ...  appears 
to  be  almost  out  of  the  question”  (p.  332). 


Table  1 .  Points  Needed  by  Some  Small 
Composite  Designs 


Factors,  k 

Coefficients 

2 

3 

4 

5 

6 

7 

8 

9 

H*  +  1)(*+2) 

Points  in  Box-Hunter 

6 

10 

15 

21 

28 

36 

45 

55 

(1957)  designs 
Hartley’s  number 

a 

14 

24 

26 

44 

78 

80 

146 

of  points 

Westlake's  number 

6 

10 

16 

26 

28 

46 

48 

82 

of  points 

_ 

— 

— 

22 

— 

40 

— 
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2,  CONSTRUCTING  SMALL 
COMPOSITE  DESIGNS 

Can  Westlake’s  small  numbers  of  runs  for  the  k  =  5, 
7,  and  9  cases  be  beaten?  The  surprising  answer  is  yes. 
Moreover,  for  /c  =  5  and  9  it  is  possible  to  equal  the 
number  of  runs  in  a  simple  manner,  and  for  k  =  7, 
simple  designs  are  available  with  only  42  runs,  two 
more  than  Westlake’s  40,  The  overall  advantage  of 
these  suggested  designs  is  that  none  of  the  ingenuity 
shown  by  Westlake  (1965)  is  needed,  thanks  to 
Plackelt  and  Burman  (1946),  and  yet  an  apparently 
large  selection  of  possibilities  is  immediately  available. 
(As  we  shall  see  later,  the  selection  is  not  as  large  as 
first  appears  I) 

The  basic  method  can  be  simply  stated:  (a)  Use,  for 
the  cube  portion  of  the  design,  k  columns  of  a  Plackett 
and  Burman  (1946)  design,  (b)  Where  repeat  runs 
exist,  remove  one  of  each  duplicate  pair  to  reduce  the 
number  of  runs. 

Let  (1)  be  written  in  the  matrix  form  y  —  Xp  +  e.  If 
(X'X)  - 1  exists,  we  have  a  valid  second-order  response- 
surface  design  that  will  estimate  all  of  the  parameters 
in  (1).  To  avoid  the  possibility  of  actual  or  near  singu¬ 
larity  merely  due  to  choice  of  a,  I  initially  followed 
Westlake  (1965)  by  selecting  the  star  with  unit  axial 
distance,  namely  with  points (±  1,0,  ...,0), (0,  ±  1, 

0), . . . ,  (0, 0, . . . ,  + 1).  In  practice,  this  value  of  a  may  be 
varied,  since  its  value  does  not  affect  the  singularity  or 
nonsingularity  of  the  design,  apart  from  the  following 
feature:  When  a  ^  kin,  the  design  has  two  spheres  of 
points  with  radiuses  fc1/2  and  a,  so  center  points  are 
not  needed  (see  Box  and  Hunter  1957,  p.  217).  If  the 
choice  a  =  kliz  were  made,  however,  center  points 
would  be  essential  to  avoid  design  singularity.  In  later 
computalions^rcported  here,  I  used  the  values  a  =  2 
(for  k  =  5),  a  =  81/2  =  2.828427  (k  =  7),  and  ot  = 
27/4  =  3.363586  (k  -  9).  These  were  suggested  by  a 
referee,  because  they  are  the  values  that  provide  rotat¬ 
able  designs  if  a  2k~ 1  design  is  used  with  a  star  of  axial 
distance  a  for  k  =  5  and  7,  and  if  a  2fc  _  2  design  is  used 
similarly  for  k  =  9. 

3.  CASE  k  =  5 

There  are  21  coefficients  to  estimate,  and  there  are 
10  axial  points.  The  difference  of  11  is  thus  the  mini¬ 
mum  possible  number  of  cube  points  required.  An 
obvious  choice  is  to  use  five  (of  the  1 1)  columns  of  a 
12-run  Plackett  and  Burman  (1946)  design.  There  are 
(V)  that  is,  462  possible  choices,  all  of  which  produce 
nohsingular  designs.  These  require  22  runs,  the  same 
number  as  Westlake’s.  A  detailed  examination  of  the 
cube  portions  for  the  designs  shows  that  there  are  two 
basic  types;  standardized  versions  of  these  appear  in 
Table  2. 


Table  2.  Two  Essentially  Different  Choices  of 
Five  Columns  From  a  12- Run  Plackett  and 
Burman  Design:  (a)  With  a  Pair  of  Repeat 
Runs;  ( b )  With  a  Mirror -Image  Pair  of  Runs 


NOTE:  All  other  choices  are  equivalent  to  one  ol  these,  subject  to 
changes  In  signs  throughout  one  or  moro  columns,  renaming  of  vari¬ 
ables,  and  reordering  of  runs. 


4.  CASE  k  =  7 

There  are  36  coefficients  to  estimate,  and  there  are 
14  axial  points.  Thus  a  minimum  of  22  cube  points  is 
needed.  First  an  attempt  was  made  to  form  designs 
using  seven  (of  the  23)  columns  of  the  24-run  Plackett 
and  Burman  design.  Tries  with  columns  (1-7),  (1, 2,  4, 
5,  8,  9,  10),  (3-5,  7-10),  and  (I,  3,  4,  7-10)  all  produced 
singular  X'X  matrices.  There  are,  in  all,  245,157  possi¬ 
ble  column  choices,  and  it  is  conjectured  that  all  will 
fail. 

A  second  attempt  used  seven  (of  the  27)  columns  of 
the  28-run  Plackett  and  Burman  design.  More  than  20 
tries  all  produced  nonsingular  designs  with  no  fail¬ 
ures,  and  it  is  conjectured  that  all  of  the  888,030 
choices  of  seven  columns  from  27  will  do  the  same. 
These  designs  have  42  runs,  a  modest  two  more  than 
Westlake’s  40,  but  reduced  designs  with  fewer  runs  are 
also  possible. 

Features  we  have  already  noted  in  the  k  =  5  case 
also  arise  here.  Many  of  the  possible  column  choices 
provide  identical  or  essentially  identical  sets  of  points; 
some  choices  provide  repeat  runs  and  some  provide 
mirror-image  runs.  A  new  feature  for  k  :  =  7  is  that 
some  sets  of  columns  provide  both  repeats  and  mirror 
images,  and  some  neither! 

How  many  distinct  designs  are  there?  Based  on  the 
number  of  different  |  X'X  |  matrices  found  in  a  trial- 
and-error  selection  of  designs,  there  are  at  least  15. 
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5.  CASE  k  =  9 

There  are  55  coefficients  to  estimate,  and  there  are 
18  axial  points.  Thus  a  minimum  of  37  cube  points  is 
needed.  One  possibility  is  to  use  nine  (of  the  39)  col¬ 
umns  of  the  40-run  Plackett  and  Burman  design.  Tries 
with  columns  (1-9)  and  (2-9,  39)  failed,  producing  a 
singular  X'X  matrix.  It  is  conjectured  that  all 
21 1,915,312  possible  choices  will  fail  similarly.  Parallel 
to  this,  1  note  Westlake’s  (1965)  remark  that,  fora  3/16 
fraction  of  a  27,  “while  one  apparently  valid  defining 
relation  exists,  it  is  impossible  to  pick  three  1/16  repli¬ 
cates  so  as  to  give  a  non-singular  X'X  matrix”  (p.  329). 

A  second  attempt  used  nine  (of  the  43)  columns  of 
the  44-run  Plackett  and  Burman  design.  More  than  20 
tries  all  produced  nonsingular  62-run  designs,  the 
same  number  of  runs  as  Westlake’s.  There  were  no 
failures,  and  it  is  conjectured  that  all  563,921,995 
column  choices  will  produce  nonsingular  designs. 

Features  similar  to  the  k  =  7  case  again  arise.  De¬ 
signs  certainly  exist  with  up  to  three  pairs  of  repeats 
and  up  to  two  pairs  of  mirror-image  runs. 
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ABSTRACT 

In  the  army  sensitivity  testing  environment  it  is  often  desired  to  estimate  Vr„,  the 
velocity  at  which  1/2  of  a  given  projectile  population  would  penetrate  a  given  plate  of 
armor.  Excessive  cost  of  experimental  units  usually  necessitates  the  use  of  very  small 
samples  -  often  less  than  15.  Several  studies  have  been  done  to  examine  the  performance 
of  some  of  the  available  design  and  estimation  techniques  under  restrictive  sample  sizes. 
Discussed  will  be  some  extensions  of  those  studies  with  emphasis  on  additional  practical 
environment  considerations  such  as  nonnormal  response  functions,  stimulus  noise,  esti¬ 
mate  existence,  and  initial  design  point  selection. 
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INTRODUCTION 


In  the  army  quantal  response  testing  environment,  excessive  cost  of  experimental 
units  usually  necessitates  the  use  of  small  samples.  Several  small  sample  studies  have 
been  done  to  examine  the  performance  of  some  of  the  available  design  and  estimation 
techniques.  This  paper  discusses  extensions  of  those  studies  including  additional  practi¬ 
cal  environment  considerations  such  as  estimate  existence,  nonnormal  response  func¬ 
tions,  and  stimulus  noise. 

The  quantal  response  testing  environment  is  one  in  which  there  are  only  two  possi¬ 
ble  outcomes  for  each  experimental  unit.  For  example,  if  a  projectile  were  fired  against 
a  plate  of  armor  one  could  observe  a  penetration  (response)  or  a  nonpenetration.  Con¬ 
tinuing  with  this  example,  suppose  an  experimenter  wishes  to  assess  the  performance  of 
a  particular  projectile.  One  way  to  characterize  performance  is  to  consider  the  probabil¬ 
ity  of  a  projectile  perforating  the  armor  at  various  velocities.  Thus,  assessing  the  perfor¬ 
mance  of  a  projectile  in  this  manner  amounts  to  establishing  some  appropriate  probabil¬ 
ity  distribution. 

Assume  that  associated  with  every  projectile  is  a  critical  velocity  above  which  the 
projectile  would  penetrate  the  armor  and  below  which  it  would  fail  to  penetrate.  Then 
critical  velocity  is  a  continuous  random  variable.  What  is  left  for  the  experimenter  is  to 
characterize  the  probability  measure  associated  with  the  random  variable,  critical  velo¬ 
city.  Note  that  critical  velocity  is  not  directly  observable  since  in  no  way  can  the  experi¬ 
menter  sample  directly  from  a  population  of  critical  velocities.  Rather,  the  experimenter 
can  only  collect  (response,  nonresponse)  data.  If  a  response  is  observed  at  a  particular 
velocity  then  all  that  can  be  said  is  that  that  velocity  was  in  excess  of  the  critical  velo¬ 
city  for  that  particular  projectile.  In  this  manner  data  can  be  collected  pertinent  to  the 
response  function,  or  the  probability  distribution  of  critical  velocity.  Historically  in  test¬ 
ing  these  projectiles,  the  median  of  this  distribution,  V50,  is  of  particular  interest  pri¬ 
marily  because  it  takes  fewer  rounds  to  estimate  than  other  quantiles.  We  will  continue 
with  that  convention  here. 

Our  purpose  in  examining  this  problem  was  twofold.  The  first  was  to  examine  the 
effect  of  day  to  day  problems  in  sensitivity  testing  under  a  representative  ’in  practice’ 
scenario.  The  second  was  to  compare  several  design  and  estimation  procedures  in  this  ’in 
practice’  setting.  Our  attention  here  will  be  focused  on  our  first  purpose. 


DESIGN  CONSIDERATIONS 

A  detailed  Monte-Carlo  study  was  performed  which  incorporated  some  problems 
encountered  in  practice.  Under  each  set  of  test  conditions  700  iterations  were  run  giving 
rise  to  estimates  of  V5p.  The  response  for  this  study  was  taken  to  be  the  sample  popula¬ 
tion  of  the  estimate,  V50,  expressed  in  terms  of  the  empirical  density,  its  mean,  and  in 
particular  the  n/MSE. 
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The  test  design  appears  in  Figure  1.  Five  designs,  each  in  conjunction  with  three 
estimation  procedures,  were  used  in  this  study.  The  Delayed  Robbins-Monro  (DRM)  and 
the  Adaptive  Robbins-Monro  (ARM)  are  variations  of  the  well  known  Stochastic 
Approximation  Method  of  Robbins  and  Monro.  The  Estimated  Quantal  Response  Curve 
(EQRC),  used  in  conjunction  with  DRM  and  ARM  in  this  study,  is  a  recent  technique 
introduced  by  Wu  (1985).  The  Langlie  procedure  is  one  currently  used  in  much  of  the 
army’s  quantal  response  testing.  These  five  constitute  some  reasonable  designs  for  use  in 
our  testing  environment.  References  are  sited  at  the  conclusion  of  this  paper  for  those 
interested  in  the  details  of  these  procedures. 

The  first  estimation  procedure  is  a  maximum  likelihood  estimation  method  with  an 
assumed  normal  response  function  and  is  denoted  NMLE.  The  second  (AVR)  is  an  arith¬ 
metic  average  of  the  velocities  giving  rise  to  the  k  lowest  responses  and  the  k  highest 
nonresponses  where  k  is  usually  taken  to  be  2  or  3.  This  second  estimate  is  frequently 
used  by  Aberdeen  Proving  Ground,  particularly  in  the  absence  of  a  unique  maximum 
likelihood  estimate.  The  last,  Next  Stress,  is  simply  the  next  design  point  of  the  sequen¬ 
tial  design.  For  DRM,  ARM,  and  EQRC,  Next  Stress  is  the  intended  estimate. 

The  above  designs  and  estimation  techniques  were  compared  under  the  following 
test  conditions.  For  some  more  expensive  rounds,  experimenters  fire  15  rounds  in  hopes 
of  getting  12  or  more.  Some  are  disqualified  due  to  erratic  flight  of  the  round.  Recently 
the  encouraged  policy  has  been  to  use  as  few  as  9.  Thus,  representative  sample  sizes  of 
9,  12,  and  15  were  considered. 

Another  factor  to  be  accounted  for  is  noise  associated  with  the  firing  velocity  of 
each  round.  It  is  not  possible  for  experimenters  to  control  precisely  the  velocity  at 
which  a  round  is  fired.  In  fact,  for  some  extensively  studied  data  sets  the  ratio  of  the 
estimated  noise  standard  deviation  to  the  estimated  population  standard  deviation 
(assuming  normal  response  function)  was  .15(7  or  more.  It  was  thought  that  this  amount 
of  variation  would  limit  the  ability  of  a  sequential  design  to  converge  on  V60.  Three  lev¬ 
els  of  noise  were  considered:  the  absence  of  noise,  normal  (0,  [.15cr]2),  and  exponential 
with  median,  0,  and  standard  deviation,  .15<r.  In  each  of  the  above  and  in  the  following, 
a  is  the  standard  deviation  of  the  response  function. 

Input  from  the  experimenter  is  used  for  establishing  the  initial  design  point,  (start¬ 
ing  value)  and  the  range,  (gate  width)  over  which  the  median  V50  can  be  found.  The 
latter  is  used  in  establishing  the  magnitude  of  step  sizes  in  the  sequential  designs  and 
actually  bounds  acceptable  design  points  in  the  case  of  Langlie’s  design.  Unavoidably, 
there  is  often  a  great  disparity  between  initial  estimates  and  actual  values.  Conse¬ 
quently,  it  is  reasonable  to  investigate  how  well  designs  and  associated  estimates 
rebound  from  poor  initial  information.  Four  starting  values  were  combined  with  three 
gate  widths  in  this  study. 

Finally,  it  was  desired  to  examine  the  design  and  estimator  performance  under 
different  response  functions.  Of  the  five  listed  only  the  first  four  will  be  considered  here. 
Each  have  median,  0,  and  standard  deviation,  1,  with  the  obvious  exception  being  the 
Cauchy  whose  quartiles  were  made  equivalent  to  those  of  the  normal. 
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Design  and  estimation  over  various  test  conditions 


ANALYSIS 


One  observation  we  made  was  that  as  the  sample  size  increased,  the  precision  of 
the  estimate  improved  regardless  of  the  design  and  estimator  used.  An  example  of  this 
is  given  in  Figure  2.  We  note  here  that  \/MSE  is  the  root  mean  square  error.  In  addi¬ 
tion,  a  case  set  is  a  pairing  of  a  starting  value  and  a  gate  width.  The  reader  need  only 
know  that  cases  1-9  are  the  same  in  each  situation  and  represent  a  good  mixture  of  pos¬ 
sibilities. 

With  regard  to  noise,  our  study  showed  AVR  and  NMLE  estimations  to  be  insensi¬ 
tive  to  normal  noise  and  only  mildly  sensitive  to  asymmetric  noise.  In  Figure  3  we  see  a 
comparison  of  V5o*’s  ,  the  average  of  700  simulated  Vso’s  for  each  case  set.  In  the  case 
of  asymmetric  noise,  the  average  is  biased  upward  slightly  toward  the  longer  tail  of  the 
response  function.  However,  in  Figure  4  we  see  little  difference  among  the  three  levels 
of  noise  for  those  same  test  conditions.  We  found  Next  Stress  to  be  sensitive  to  noise 
and  particularly  to  asymmetric  noise.  In  Figure  5  the  effect  of  noise  on  the  precision  of 
the  Next  Stress  estimator  is  evident.  In  Figure  6  with  the  actual  median  indicated  by 
the  arrow,  note  the  apparent  shift  of  the  estimate  population  toward  higher  velocities, 
the  long  tail  of  the  asymmetric  noise  density. 

The  designs  and  estimators  considered  here  are  influenced  by.  the  shape  of  the 
underlying  response  density.  In  Figure  7  V50*  comparisons  are  made  with  some  zero  and 
normal  noise  cases.  Note  that  the  average  of  the  estimator  is  approximately  the  true 
value  of  the  parameter  except  in  the  case  of  an  exponential  density  and  for  two  cases  of 
the  Cauchy  density.  In  Figure  8  these  same  case  sets  are  compared  by  \/MSE.  We  see 
that  the  uniform  density  results  are  somewhat  higher  than  the  normal  and  that  the 
Cauchy  and  exponential  densities  each  have  some  extremely  low  values.  This  is  particu¬ 
larly  interesting  in  the  case  of  the  exponential  since  its  estimate  population  mean  was 
biased  upwards.  The  reason  for  such  behavior  rests  in  the  shape  of  the  densities. 

Consider  for  a  moment  a  density  with  point  mass  unity  representing  the  critical 
velocity  probability  mass.  Then  if  a  sequential  design  were  used,  the  step  for  the  next 
design  point  would  always  be  taken  in  the  direction  of  the  point  of  jump.  Thus  the 
design  would  never  make  a  wrong  decision,  the  decision  moving  the  data  collection  away 
from  the  median.  Hence,  it  would  converge  in  an  ideal  sense  to  the  median.  Of  course 
in  order  to  make  a  good  estimate  of  the  median,  it  is  desirable  to  sample  close  to  it. 
Thus,  a  wrong  decision  is  extremely  detrimental  over  the  first  few  rounds  of  small  sam¬ 
ple  experimentation  as  it  may  prematurely  cause  sequential  designs  to  decrease  step 
sizes,  thus  making  it  more  difficult  to  climb  back  to  the  region  about  the  median.  For 
the  densities  considered  here  there  is  a  non- zero  probability  associated  with  making  a 
wrong  decision. 

Examine  Figure  9.  Here  all  four  densities  are  considered.  Suppose  for  a  normal 
density  the  sequential  design  is  currently  at  -2,  then  we  have  only  a  probability  of  .0228 
of  making  a  wrong  decision.  That  is,  there  is  only  probability  .0228  associated  with 
critical  velocities  below  -2  which  would  cause  a  response  to  be  recorded  and,  conse¬ 
quently,  a  step  down  on  the  stress  axis  to  the  next  design  point.  With  this  in  mind,  one 
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Figure  2.  Effect  of  sample  size  on  precision. 
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Figure  3.  Effect  of  noise  on  sample  median. 
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Figure  6.  Effect  of  asymmetric  noise  on  Next  Stress' 
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Figure  7.  Response  curve  influence  on  median  estimate. 
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Figure  8.  Effect  of  response  curve  on  NMLE  estimator. 
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Figure  9.  Response  function  densities 
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can  explain  the  behavior  of  the  designs  for  each  response  function. 


For  the  Cauchy  density,  once  the  design  was  sampling  close  to  the  median,  the  con¬ 
centration  of^robability  in  that  area  was  holding  the  design  there.  This  gave  rise  to 
the  low  \/MSE  in  Figure  8.  On  the  other  hand,  for  case  2  in  Figure  7  where  sampling 
began  in  the  tail,  the  heavy  tail  of  the  Cauchy  gave  a  relatively  high  probability  of 
going  further  out  in  the  tail.  When  the  design  moved  back  toward  the  median,  estima¬ 
tion  was  weighted  by  the  low  probability  response,  resulting  in  V  values  well  below 
those  of  the  other  densities. 

In  the  case  of  the  exponential,  most  of  the  probability  mass  is  contained  in  the 
interval  (-.69,  .69)  -  relatively  close  to  the  median.  Again,  once  the  design  reached  this 
area,  the  concentration  of  probability  was  likely  to  hold  it  there,  giving  rise  to  Figure  8 
results.  However,  when  the  design  did  wander,  it  could  only  wander  in  one  direction, 
thus  causing  the  V50*’s  to  be  higher  than  for  the  symmetric  distributions.  The  uniform 
and  normal  explanations  follow  along  these  same  lines. 

In  support  of  this  explanation  we  offer  as  examples  Figures  10-13.  In  each  figure 
the  700  V50’s  are  given  in  histogram  form.  Note  that  -1.1  and  1.3  bound  the  normal 
V50’s  where  as  -2.5  and  1.8  bound  the  Cauchy  V6o’s.  In  addition,  the  sample  estimate 
population  appears  slightly  more  peaked  for  the  Cauchy  density  than  for  the  normal. 
Note  also  the  shape  of  the  sample  estimate  population  corresponding  to  the  exponential. 
It  is  skewed  to  the  right  but  at  the  same  time  very  peaked  about  the  median. 

One  important  idea  resulting  from  these  observations  rests  with  the  heavy  tails  of 
the  Cauchy.  It  is  doubtful  that  with  historical  small  sample  data  that  a  normal  density 
could  be  discerned  from  a  Cauchy  with  matching  quartiles.  Yet  these  simulation  results 
show  that  problems  in  estimation  can  result  when  heavy  tails  are  present.  Therefore, 
the  experimenter  needs  to  be  aware  of  this  problem  when  picking  starting  values  and 
step  sizes. 

Thus  far  only  moderate  attention  has  been  given  to  the  estimation  procedures.  In 
general,  we  found  the  NMLE  and  AVR  methods  to  track  very  closely  over  a  wide  range 
of  starting  values  and  gate  widths.  Figure  14  shows  an  example  of  this  in  terms  of 
\/MSE.  However,  Next  Stress,  with  its  sensitivity  to  noise  environments,  does  not  track 
well  with  the  other  two  for  normal  and  asymmetric  noise;  an  example  is  given  in  Figure 
15.  It  should  be  noted  that  Next  Stress  is  the  intended  estimator  for  all  designs  except 
the  Langlie  which  uses  NMLE.  Over  the  wide  range  of  cases  NMLE  seems  to  be  the 
best  performer. 

The  comparison  of  designs  was  too  involved  to  address  in  the  time  allotted  for  this 
talk.  We  will  say  only  that  under  NMLE  all  the  designs  performed  similarly.  This  is  not 
to  say  that  some  are  not  better  than  others,  but  only  that  in  this  small  sample  environ¬ 
ment  not  enough  rounds  are  available  to  show  superiority  where  it  is  present. 
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Figure  10.  Empirical  estimate  density  for  normal  response 
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Figure  11.  Empirical  estimate  density  for  Cauchy  response 
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Figure  12.  Empirical  estimate  density  for  uniform  response 
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Figure  13.  Empirical  estimate  density  for  exponential  response 


Equivalent  Case  Sets 


Figure  15.  Comparison  of  NMLE ,  AVR  and  Next  Stress  estimators. 
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SUMMARY 


Id  summary,  several  important  observations  follow.  First,  the  starting  value  and 
gate  width  have  a  significant  effect  on  \/MSE  Second,  the  response  function  does 
influence  the  design  point  selection  and  estimation.  In  particular,  heavy  tails  could 
adversely  affect  the  estimate  of  V50.  Third,  sample  size  changes  from  9  to  15  result  in 
an  increase  in  precision  of  about  25%.  Fourth,  in  noise  environments,  NMLE  is  the  pre¬ 
ferred  method  of  estimation  regardless  of  design.  In  the  absence  of  noise,  there  is  no 
clear  difference  among  the  three 'estimators.  Last,  there  is  no  clear  advantage  in  using 
one  design  over  another  in  terms  of  the  quality  of  the  estimate.  However,  certain  imple¬ 
mentation  considerations  will  help  the  experimenter  choose  one  to  suit  his  needs. 
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ABSTRACT 

Human  factors  include  the  ways  in  which  people  acquire, 
process,  and  convey  information.  They  affect  the  quality  of 
people's  judgements  and  thus  become  a  concern  when  these 
judgments  are  being  elicited  for  use  as  data.  This  paoer 
focuses  on  five  human  factors:  question  phrasing,  conser¬ 
vatism,  inconsistency,  overoptimism,  and  social  pressures. 
Techniques  for  detecting  and  reducing  the  occurrence  of 
these  human  factors  are  given  for  two  methods  of  eliciting 
subjective  data,  the  mail  survey  and  the  interactive  group 
method.  Techniques  for  structuring  the  elicitation  methods 
are  proposed  as  the  main  means  for  countering  the  occurrence 
of  human  factors. 


THE  HUMAN  FACTORS 

Human  factors  can  affect  the  quality  of  the  subjective  data  in  many  ways. 
Human  factors  include  the  ways  in  which  people  acquire,  remember,  process,  and 
present  information  that  inhibit  their  reaching  mathematically  optimal 
decisions.  The  human  acquisition  of  data  is  biased  because  humans  selectively 
learn  that  which  supports,  rather  than  opposes,  their  views  (Mahoney  1976, 
Hogarth  1980).  For  example,  people  are  unconsciously  drawn  to  acquire  informa¬ 
tion  which  supports,  rather  than  refutes,  their  preconceptions  (Mahoney  1976). 
Then  too,  people  can  acquire  faulty  information  because  of  the  role  that  feed¬ 
back  plays  in  the  learning  process.  When  people  receive  no  feedback,  delayed, 
or  only  partial  feedback,  as  often  occurs,  they  may  draw  incorrect  conclusions 
(Hogarth  1980).  For  example,  scientists  who  often  receive  only  partial  confir¬ 
mation^  their  hypotheses  are  likely  to  consider  this  sufficient  validation  or 
to  believe  those  data  points  which  support  their  theory  and  mentally  dismiss  the 
others  (Mahoney  1976).  The  information  acquired  is  stored  and  may  be  later  ac¬ 
cessed  by  the  person  during  an  elicitation  session. 

How  easily  such  information  can  be  accessed  from  memory  also  affects 
peoples'  judgments  during  an  elicitation  session.  Concrete,  catastropic,  or 
widely  publicized  information  is  more  easily  accessible  and  thus  more  greatly 
influences  a  person's  judgment  than  less  memorable  information  (Spetzler  and 
Stael  von  Holstein  1975,  Hogarth  1980).  For  example,  it  is  thought  that  the 
League  of  Women  Voters  ranked  the  nuclear  industry  as  posing  the  greatest  oc¬ 
cupational  hazards  to  its  employees  of  any  industry  because  of  th'e 
disproportionate  amount  of  media  coverage  this  industry  had  received. 
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The  processing  of  data  in  the  human  mind,  such  as  during  an  elicitation 
session,  is  also  subject  to  human  factors.  Generally,  peoole  have  difficulty 
processing  more  than  seven  pieces  of  information  at  a  time  (Miller  1956). 
Typically,  they  will  select  a  heuristic  for  solving  a  problem  in  a  decision 
situation  which  then  influences  the  decision  they  reach.  For  example,  managers 
may  focus  on  the  major  aspects  of  the  problem  and  ignore  the  uncertainties  and 
comolex  interactions  of  factors  to  reach  a  decision  (Bender  et  al.,  1981).  This 
simplifying  heuristic  may  point  to  a  different  decision  than  one  which  had  in¬ 
cluded  all  the  complexities  of  the  problem.  In  applying  these  heuristics, 
people  are  likely  to  be  inconsistent,  thus  further  complicating  the  gathering  of 
quality  subjective  data.  For  example,  the  manager  may  have  been  forecasting  the 
completion  data  of  a  large  project  by  adding  together  the  blacks  of  time  that 
each  major  phase  was  likely  to  require.  He  may  have  forgotten  to  add  in  a  phase 
being  done  by  a  subcont ractor ,  thus  failing  to  consistently  follow  his  own 
heuri  Stic. 

Additional  complications  may  enter  as  a  result  of  the  mqde  in  which  par¬ 
ticipants  are  requested  to  give  the  judgments.  For  example,  respondents  may 
give  different  judgments  on  a  survey  than  they  would  in  an  interview  situation 
(Payne  1951).  They  might  give  varying  judgments  to  different  phrasings  of  the 
same  question  (Payne  1951,  Sudman  and  Bradburn  1982,  Garden  1980).  Then  too, 
they  might  give  different  judgments  if  they  are  giving  it  in  "willingness  to 
gamble"  or  "probability"  schemes  (Winkler  1967,  Hogarth  1980). 

Due  to  the  constraints  of  time,  five  human  factors  were  selected  for  dis¬ 
cussion.  These  five  factors  are  widely  prevalent  and  often  interrelated  as  will 
be  described  below.  The  five  human  factors  include  the  effects  of: 

1)  Presentation  of  the  decision  task  and  phrasing  of  the  ^questions  or  response 
options; 

2)  Conservatism; 

3)  Inconsistency; 

4)  Overoptimism  and; 

5)  Social  pressure. 

Evidence  of  the  effect  of  the  presentation  of  the  decision  task  on  the  in¬ 
dividual's  response  has  been  documented  by  Tversky  and  Kahnemen  (1981).  They 
asked  students  which  alternatives  they  preferred  in  gain  and  loss  situations. 
For  example,  students  chose  between:  1)  a  sure  gain  of  S250;  and  2)  a  25%  chance 
of  gaining  $1000  or  a  75%  chance  of  gaining  nothing.  In  the  set  of  loss  alter¬ 
natives,  they  chose  between;  1)  a  sure  loss  of  $750;  and  2)  a  75%  chance  to  lose 
$1000  or  a  25%  chance  to  lose  nothing.  The  majority  preferred  the  sure  gain  in 
the  first  pair  of  options  and  the  risky  loss  in  the  second  pair.  Thus,  tne 
relative  attractiveness  of  options  varies  when  the  same  decision  is  framed  in 
different  ways.  Furthermore,  individuals  are  generally  unaware  of  the  effect  of 
question  framing  and,  if  informed  of  it,  uncertain  of  how  to  compensate  for  its 
effect. 

In  addition,  there  is  evidence  that  the  response  mode,  such  as  probabil¬ 
ities  or  equivalent  gambles,  influence  peoples'  judgment  (Winkler  1967,  Hogarth 
1980).  For  example,  Winkler  (1967)  recommended  that  a  "willingness  to  pay" 
response  mode  be  used  because  people  gave  more  conservative,  hence  more  realis¬ 
tic,  estimates  using  this  response  mode  than  using  probabilities.  Similarly, 
the  scales  used  for  the  responses,  such  as  1  to  10  or  -5  to- +5,  can  influence1 
peoples'  judgments. 
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The  effect  of  question  phrasing  has  been  shown  most  dramatically  by  Payne 
/ 1951 )  through  his  use  of  the  split  ballot  technique  in  survey  questions.  The 
split  ballot  technique  entails  giving  half  of  a  survey  sample  one  wording  of  a 
question  or  response  option  and  the  other,  another.  For  example,  one  wording or 
a  question  might  be,  "Do  you  believe  that  X  event  will  occur  by  Y  time?  Tne 
other  wording  might  be,  "Oo  you  believe  that  X  event  will  occur  by  Y  time,  or 
not?”  This  second  option  is  more  balanced  because  it  mentions  both 
possibilities.  For  this  reason  it  would  be  likely  to  receive  a  higher  percent- 


"no" 


age  of 
technique  is 


responses.  Often  the  difference  measured  by  the  split  ballol 
4—15%  even  when  the  rewording  has  been  very  slight. 


i nvol ves 
of 


the  i ndi vi dual  1  s 
adjusting  it  to 


Conservatism,  or  anchoring  bias, 
cling  to  their  first  judgment  instead 
information.  Sometimes  this  tendency  is  explained  i_n  terms  or  Bayes  ineori 
the  failure  to  adjust  a  judgment  in  light  of  new  information  as-  mucn  as  it  v 


tendency  to 
reflect  new 
;n  as 
would 

be'according  to  Bayes*  mathematical  formula.  Spetzler  and  Stael  von  Holstein 
(  1972)  and  Armstrong  (1981)  describe  how  people  tend  to  anchor,  to  their  initial 
response,  using  it  as  the  basis  for  later  responses.  For  example,  the  subject 
may  use  the  last  year's  sales  as  a  starting  point  in  predicting  this  year  s 
sales  and  fail  to  consider  other  points  on  this  distribution  independently  .  rom 
this  starting  point.  In  addition,  Ascher  (1978)  finds  this  problem  to  exist  )n 
forecasting  where  panel  members  tend  to  anchor  to  past  or  present  trends  in 
their  projection  of  future  trends.  Ascher  determined  that  one  of  the  major 
sources  of  inaccuracy  in  f orcasti ng  future  possibilities,  such  as  markets  for 
utilities,  was  the  extrapolation  from  old  patterns  that  no  longer  represented 
the  emerging  or  future  patterns. 


Inconsistency  occurs  when  individuals  give  contradictory  judgments.  For 
example,  they  might  give  item  A  a  higher  rating  than  B  with  respect  to  goal  X,  B 
a  higher  rating  than  C,  and  C  a  higher  rating  than  A.  Inconsistency  is  a  common 
problem  because,  as  mentioned  earlier,  individuals  are  generally  unable  to  apply 
a  consistent  strategy,  or  heuristic,  to  a  series  of  cases  (Hogarth  1-80). 
Inconsistency  in  an  individual's  judgment  can  also  stem  from  his  remembering  or 
forgetting  information  during  the  process  of  the  elicitation  session.  For  ex- 
amole,  the  individual  may  remember  some  of  the  less  spectacular  pieces  of 
information  and  consider  these  in  making  judgments  later  in. the  session.  Ur, 
the  individual  may  forget  that  particular  ratings  are  only  to  be  given  in  ex¬ 
treme  cases  and  begin  to  give  them  more  freely  towards  the  end  of  a  session  than 
at  the  beginning. 


Overoptimism  is  sometimes  referred  to  as  the  oversstimation  of  praoaDil- 
ities,  overconfidence  bias,  or  the  underestimation  of  uncertainty.  Overootimism 
is  the  giving  of  more  optimistic  judgments,  such  as  in  the  form  of  probabil¬ 
ities,  than  the  person's  data  warrants.  People  tend  to  be  overly  optimistic  of 
the  probability  of  some  event  occuring  and  often  underestimate  the  uncertainty, 
or  the  time  and  resources  needed  to  make  this  event  a  reality.  Thus,  they  give 
too  narrow  of  error  bars  on  these  judgments  (Capen  1975).  Overoptimism  can 
stem  from  a  variety  of  causes:  1). thinking  at  too  general  a  level  ;  2)  wishful 
thinking;  and  3)  illusion  of  control.  Armstrong  (1975)  and  Hayes-Roth  (1980) 
have  shown  that  people  give  higher,  less  realistic,  probabilities  when  they  con¬ 
sider  decision  tasks  in  general  than  when  they  disaggregate  them  into  their 
component  parts.  For  example,  Armstrong  (1975)  asked  straight  Almanac  questions 
of  one  half  of  his  sample.  Of  the  other  half,  he  asked  the  same  Almanac  ques¬ 
tions  but  broken  into  logical  parts.  For  instance,  the  question  How  many 

88 


families  were  living  in  the  U.S.  in  1970?"  was  asked  as  "What  was  the  peculation 
of  the  U.S.  in  1970?"  and  "How  many  people  were  there  in  the  average  family 
then?".  The  persons  answering  the  disaggregated  questionsgive  significantly 
more  accurate  judgments. 

Wishful  thinking  occurs  when  an  estimator's  hopes  influence  his  judgment 
(Hogarth  1980).  For  example,  a  project  manager  in  charge  of  a  project  may  give 
optimistic  probabilities  about  completing  it  on  schedule  because  he  hopes  this 
will  be  the  case.'  In  general,  people  exhibit  wishful  thinking  about  what  they 
can  exhibit  in  a  given  amount  of  time—They  overestimate  their  productivity 
(Hayes -Roth  1980). 

Illusion  of  control  is  the  tendency  to  feel  greater  optimism  or  greater 
confidence  in  some  outcome,  if  one  has  been  involved  in  its  process  (Hogarth 
1980).  People  can  acquire  the  impression  of  having  more  control  over  outcomes 
simply  by  spending  time  analyzing  a  situation  as  in  a  elicitation  session 
(langer  1975).  Similarly,  people  perceive  risks  as  being  lower  when  they  feel 
that  they  are  in  control  of  a  process.  For  example,  people  perceive  less  risk 
when  they  are  driving  a  car  than  when  they  are  riding,  as  a  passenger,  in  a 
plane  (Rowe  1982). 

♦  ' 

Social  pressure,  induces  individuals  to  slant  their  responses  or  to  silently 
acquiese  to  what  they  believe  will  be  acceptable  to  their  group,  superordinates, 
institution,  or  society  in  general.  Zimbardo,  a  psychologist,  explains  that  it 
is  due  to  the  basic  needs  of  people  to  be  loved,  respected,  and  recognized  that 
they  can  be  induced  or  choose  to  behave  in  a  manner  which  will  bring  them  affir¬ 
mation  (1983).  There  is  abundant  sociological  evidence  of  conformity  within 
groups  (Weissenberg  1971).  Generally,  individuals  in  groups  conform  to  a 
greater  degree  if  they  have  a  strong  desire  to  remain  a  member,  if  they  are 
satisfied  with  the  group,  if  the  group  is  cohesive,  and  if  they  are  not  a 
natural  leader  in  the  group.  Furthermore,  the  individuals  are  generally  unaware 
that  they  have  modified  their  judgment  to  be  in  agreement  with  the  group.  One 
mechanism  for  this  unconscious  modification  of  opinion  is  explained  by  the 
theory  of  cognitive  dissonance.  Cognitive  dissonance  occurs  when  an  individual 
finds  a  discrepancy  between  thoughts  he  holds  or  between  his  beliefs  and  his  ac¬ 
tions  (Festinger  1957).  Far  example,  if  an  individual  holds  an  opinion  which  is 
conflict  with  that  of  the  other  group  members  and  he  has  a  high  opinion  of  the 
other's  intelligence,  cognitive  dissonance  will  result.  Often,  the  individual's 
means  of  resolving  the  discrepancy  is  by  unconsciously  changing  his  judgment  to 
be  in  agreement  with  that  of  the  group  (Baron  and  Byrne  1981). 

Irving  Janis's  study  of  fiascos  in  American  foreign  policy  (1972)  il¬ 
lustrates  how  presidential  advisors  often  silently  acquiese  rather  than 
critically  examine  what  they  believe  to  be  the  group's  opinion.  This  tendencey 
has  been  called  "group  think",  the  "bandwagon  tendency",  or  the  "fol low-the- 
leader  effect." 

The  effect  of  social  pressure  can  also  be  seen  in  situations  where  the  in¬ 
dividual  is  not  in  direct  contact  with  others.  Payne  (1951)  has  provided 
evidence  that  people  give  socially  acceptable  answers  to  survey  questions.  On 
surveys,  people  claim  that  their  educations,  salaries,  and  job  titles  are  better 
than  they  are.  More  people  claim  subscriptions  to  socially  acceptable  magazines 
and  deny  it  to  the  lurid  ones  than  subscription  records  support.  Often  there  is 
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a  10%  difference  between  what  is  claimed  for  "prestige"  reasons  and  what  objec¬ 
tively  1  s  = 

THE  METHODS 

Methods  for  eliciting  expert  opinion  vary  along  several  continuums:  1)  the 
number  of  participants;  2)  the  degree  of  interaction  among  participants  and  be- 
Sweti  them  SSd  the  session  leader;  .3)  the  degree  of  structure  imposed  on  the 
elicitation  process;  4}  the  degree  of  participants'  expertise;^  and  5)  the  degree 


of  "fuzziness"  of  the  data  being  eiieited« 


For  example,  one  method,  the  mail  survey,  1  ™°^es  rfn°™pd*  •  ^ ^ 

interaction  amona  respondents  or  between  uhem  and  an  int^.  /i-  e.  . 
Interact!  on  ^s  "defined  «  any  two-way  ccmnuni cat, on  after  whicn  the  respondent 
is  allowed  to  change  his  judgment.  When  the  respondent  fills  out  a  sur/ 
there  is  generally  no  interaction  between  him  and  his  peers  or  between  him  a 

an  interviewer. 

Another  possibility,  the  Delphi  method,  can  inc1ude  an/  - nu™e[ ^ 
dents  and  allow  for  more  interaction  between  respondents  than  the • 
mail  survey  The  respondents'  interactions  are  controlled  by  the  Del  pm  monitor 
who  sends  each  respondent  the  judgments  of  the  others.  The  respondents  are  a  - 
owed  to  adjust  their  judgments  in  light  of  this  information  The  process  of 
aH owing  respondents  to  change  their  judgments  can  go  through  any  numoer  of 
lieMtlons  even  until  consensus  is  reached.  RAND  corporation  developed  the 
Delphi  method  to  overcome  some  of  the  problems  inherent  in  an  interactive  group 
method,  such  as  social  pressures  to  conformity For  this  rea son, in  th®  °e1P 
techniaue,  the  respondents  do  not  interact  in  a  f ace-to-»  ace  situation. 
Instead  the  only  contact  they  are  supposed  to  have  with  one  another  is  via  the 
rail  And  then,  thenames  and  other  Identifying  features  are  removed  from  the 
judgments  before  they  are  circulated  so  that  the  origins  of  these  judgments  will 
not  unduly  affect  the  recipients. 

Another  method,  the  face-to-face  interview,  usually  involves  a _ fewer  number 
of  respondents  than  the  mail  survey.  The  respondents  are  interactive,  singly, 
with  the  inteviewer  during  the  course  of  the  interview. 

Fourthly,  there  is  a  interactive  group  method.  In  this  method,  a  group  of 
three  or  more  may  be  convened  to  give  their  judgments  in  the  ^8j’enc“  .  “ 
another.  The  group  sessions  are  generally  monitored  and  structured  y  a 
lead“r  For  example,  the  leader  may  encourage  group  mempers  to  write  down  vhei  r 
judgments  and^hel  rreasoni  ng.  The  leader  may  require  ^at  thi  s  inrormati  on  be 
presented  to  the  group  and  that  a  discussion  follow.  The  n iterac.n «  group 
method  can  go  through  any  number  or  iterations,  as  in  the  Delphi  method,  until 
consensus,  if  it  is  desired,  is  reached. 

For  the  sake  of  brevity,  this  paper  will  confine  its  di scuss ion of  the 
detection  and  reduction  of  the  human  factors  to  two  methods,  the  traditional 
mail  survey  and  the  interactive  grouo  method.-  These  two  methods  were  selected 
because  they  lie  on  opposite  ends  of  the  continuum  with  respect  to. the  numoer  or 

participants  and  the  degree  of  interaction  involved. 

The  five  human  factors  are  manifested  in  different  ways  in  the  various 
methods  so  the  means  by  which  they  can  be  detected  or  reduced  also  vary. 
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example,  the  effect  of  social  pressure  is  manifested  more  strongly  in  the  inter¬ 
active  methods  such  as  the  face-to-face  interview  and  the  interactive  group 
method.  Yet,  because  these  methods  are  interactive,  much  of  the  detection  of 
social  pressure  can  be  done  by  a  trained  observer.  This  paper's  approach  to  the 
detection  and  reduction  of  human  factors  in  elicitation  methods  is  likely  to 
reflect  the  orientation  of  a  cognitive  or  social  scientist.  The  approach  is  to 
perform  a  real  time  detection  or  counteraction  of  the  human  factors  as  they  oc¬ 
cur  during  a  session  rather  than  a  later  mathematical  adjustment  of  the  data. 

This  paper  advocates  a  structuring  of  the  elicitation  methods  as  a  means 
for  reducing  the  occurrence  of  human  factors.  Structuring  an  elicitation  method 
involves  controlling  interactions,  identifying  the  parts  of  the  phenomenon  on 
which  the  respondents  are  being  questioned,  defining  them  and  the  response  oc- 
tions,  such  as  the  scale.  For  example,  an  unstructured  interactive  grouo  metnod 
would  resemble  the  usual  meeting  which  occurs  in  the  business  w.orldl  A  struc¬ 
tured  version  of  the  same  method  would  have  a' program  for  when  each  memoer  would 
present  his  judgment  and  rationale  to  the  group,  when  the  floor  was  open  for 
discussion,  and  when  the  next  round  could  begin.  In  general,  the  greater  the 
degree  of  structure  imposed  on  the  decision  process,  the  simpler  it.  is  to  con¬ 
trol  for  the  occurence  of  human  factors.  Often  a  method  cannot  be  maxi  mum!  y 
structured  because  $ach  degree  of  structure  imposed  slows  the  process  and  re¬ 
quires  more  patience  or  cooperation  on  the  part  of  the  participants.  The  client 
may  have  deadlines  and  a  fixed  budget  which  limit  the  amount  of  structuring 
which  can  be  done.  Thus,  the  amount  of  structuring  which  can  be  done  often  in¬ 
volves  tradeoffs  between  the  quality  of  the  data  and  its  cost  in  time  and 
manpower. 


The  Mail  Survey 


Detection  of  Human  Factors 


In  a  survey,  the  occurrence  of  human  factors  is  not  generally  detected 
while  the  individual  is  making  his  judgment  but  earlier  during  pilot  tests  or 
later  when  the  survey  is  analysed.  Three  factors,  the  effects  of  question 
phrasing,  social  pressure,  and  inconsistency,  can  be  detected  by  the  use  of  the 
split  ballot,  the  sleeper  option,  and  pilot  test. 

The  effects  of  question  wording  and  sequencing  of  options  can  be  detected 
by  measuring  the  differences  between  the  split  ballot  questions.  The  split  bal¬ 
lot  technique  is  most  commonly  used  for  “yes-no"  and  other  multiple  choice 
questions.  Use  of  split  ballot  techniques  in  the  past  (Payne  1951)  have  shown 
that  people  favor  generally  worded  options  over  those  which  are  hiahly  specific. 
In  addition,  they  favor  options  which  refer  to  the  status  quo  over”those  prooos- 
ing  new  alternatives.  Split  ballot  results  have  also  shown  that  people  favor 
selecting  numerical  options  which  are  located  in  the  middle  of  a  series  whereas 
they  favor  nonnumeric  options  which  are  located  on  either  end  of  the  series. 

Social  pressures  to  give  the  most  acceptable  response  can  also  be  detected 
by  use  of  the  split  ballot  technique.  One  wording  on  half  the  surveys  can  state 
the  options  bluntly,  the  other  can  contain  face  saving  phrases' to  encourage 
people  to  check  the  response  which  is  most  descriptive  of  their  thoughts  or 
actions.  A  face-saving  option  often  encourages  the  respondent  to  admitt  that  he 
does  not  have  X  knowledge  or  Y  social ly-desi rable  possession. -at  this  time  by  al¬ 
lowing  him  to  state  that  he  plans  to  acquire  them  in  the  future. 
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Another  common  area  for  the  effects  of  social  pressures  to  emerge  is  in 
oeool as 1  unwillingness  to  admitt  ignorance,  to  check  the  "I  don't  know"  option 
If  identification,  of  knowledgable  respondents  is  important,  a  di  f  >  erent  tech- 
nioue  can  be  used  to  get  a  better  indication  of  people's  knowledge  than  simoly 
"Sling  those  who  selected  the  "Don't  know”  response.  A  sleeper  notion  that 
sounds  plausible  but  which  does  not  exist  in  reality  can  be  inserted  into  the 
series  of  bonafide  options.  For  example,  on  a  survey  of  public  opinion  of 
nuclear  reactors  a  “fast  water  reactor"  might  be  inserted  between  a  "light 
water"  and  a  "breeder."  The  number  of  people  who  selec^  the  sleeper  opt. on  can 
be  added  to  those  who  marked  the  "Don't  know"  option  and  excluded  from  the  pool 
of  supposedly  knowledgable  respondents. 

Inconsistency  in  peoole's  resoonses  to  surveys  is  more  difficult  to  detect 
than  the  two  above  mentioned  effects.  Inconsistency  could  conceivaoly  be 
detected  by  the  use  of  redundant  questions  but  this  aoproacn  poses  proolems.  Ii 
the  redundant  Question  is  an  exact  repititi on ,  it  can  annoy  people  because  they 
wonder  why  they 'are  being  asked  the  same  question,  again.  Yet,;  if  the  question 
is  asked  with  a  new  wording,  respondents  may  give  different  answers  simply  be¬ 
cause  of  the  difference  in  phrasing.  Inconsistency  can  occur  because  the 
individual  has  not  applied  his  heuristi.c  consistently,  has  forgotten  instruc¬ 
tions  or  definitions,  or  has  remembered  different  incidents  as  he  progressed 
throuqh  the  survey..  An  intensive  interview  type  of  pilot  test  can  be  used  to 
check  the  survey  instrument  for  these  problems.  For  example,  one  set  of  these 
pilot  tests  revealed  that  individuals  had  forgotten  the  instructions  about  half 
way  through  the  selection  of  many  options.  The  respondents  were  supposed  to 
mark  their  areas  of  knowledge  on  a  list  spanning  two  pages.  Instead  by  the 
second  page,  one  fifth  of  the  pilot  sample  had  checked  areas  in  whicn  they  would 

have  liked  to  have  had  knowledge. 


This  type  of  pilot  test  is  the  only  one,  to  my  knowledge,  that  can  be  used 
to  tack  peoples'  thinking,  their  consistency,  through  a  survey.  I  adapted 
several  ethnographic  interviewing  techniques  to  create  this  pilot  test  metnod. 
These  techniques  gather  two  types  of  information:  1)  how  the _ respondent 
progresses  through  the  survey,  that  is  which  sections  he  looks  at,  in  what  or¬ 
der.  'and  for  how  long,  his  general  impressions,  and  when  or  why  he  decides  to 
fill  out  the  survey  and  to  turn  it  in;  and  2)  how  the  respondent  specifically 
interprets  each  direction,  question,  and  response  option. 


To  obtain  the  first  type  of  information,  the  interviewee  is  asked  to  handle 
the  survey  as  he  would  naturally,  if  no  observer  were  present.  The  interviewee 
is  asked  to  "think  outloud"  and  to  mention  his  impressions.  Generally,  in¬ 
dividuals  will  skim  the  cover' letter  and  flip  through  the  rest  of  the  survey. 
As  the  individual  flips  through  the  survey  he  might  state,  "I  have  problems  with 
this  paoe  and  I  would  probably  let  the  survey  sit  on  my  desk  for  several  days  to 
decide  whether  to  fill  it  out.  While  the  interviewee  pages  through  the  survey, 
his  pauses  and  gestures,  particularly  those  indicating  confusion  or  anxiety  are 
noted  by  the  monitor.  If  the  respondent  has  paused  or  shown  some  emotion  during 
his  review  of  a  particular  section,  specific  questions  will  be  asked  such  as, 
"What  was  your  feeling  when  you  read  this?". 


To  obtain  the  second  type  of  information,  the  respondent  is  asked  to 
paraphrase,  in  his  own  words,  the  meaning  of  each  direction,  question,  and 
response  option.  This  information  allows  the  monitor  to  track  the  respondent  s 
i nterpretation  of  each  part  of  the  survey. 
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Structuring  the  Method  to  Reduce  the  Occurrence  of  Human  Factors 


As  rnsnfci onsd  63r1  i fir i  structuring  any  si i citation  method  csn  facilitate  ths 
counteraction  of  many  human  factors.  The  following  section  contains  some  recom- 
mendati ons  on  how  to  set  up  a  mail  survey  to  obtain  better  quality  subjective 
data  by  controlling  for  the  i nt rudence  of  some  human  factors. 

The  first  stage  in  developing  the  mail  survey  can  have  an  effect  on  the 
amount  of  inconsistency  which  shows  up  later  in  the  respondents'  judgments. 
Often  se°mi ng  inconsistencies  in  the  respondents'  answers  arise  from  their  view¬ 
ing  the 'phenomena  in  a  different  manner  than  the  way  in  which  it  has  been 
presented  on  the  survey.  Because  the  survey  does  not  generally  encourage  them 
to  explain  the  view  or  assumption  which  allowed  them  to  make  tne  puzzling 
responses,  their  responses  are  dismissed  as  inconsistent  and  unreliable,  ror 
this  reason,  it  is  recommended  that  the  creator  of  the  survey  -first  talk  exten¬ 
sively  to  a  sample' of  those  who  will  be  surveyed  to  learn  what  relationships, 
causes  and  effects,  they  believe  enter  into  the  problem.  For  example,  resDon- 
dents  from  a  utility  might  believe  that  the  future  of  their  utilities  market  is 
tied  to  the  nation's  gross  national  product  (GNP ) .  If  the  task  is  to  elicit 
their  projections  for  a  utilities  market  in  year  2000,  then  the  questions  should 
define  different  levels  of  GNP .  For  instance,  "Assuming  that  the  GNP  is  X  in 
the  year  2000,  what  would  you  predict  the  market  for  Y  to  be?" 

Careful  composition  of  the  questions  can  reduce  the  occurrence  of  three 
effects:  I)  inconsistencies  which  arise  from  the  respondents'  confusion ,  2) 
phrasing,  and  3)  social  pressure.  The  use  of  Basic  English  is  recommended  if 
the  survey  is  targeted  for  the  general  public  as  one  means  for  minimizing 
misunderstandings.  Basic  English  is  a  vocabulary  of  approximately  1000  words 
that  are  understood  by  most  people  who  possess  a  high  school  education.  P^iy  ne 
(1951)  provides  a  list  of  these  words.  He  also  provides  a  list  of  words  which 
have  been  found  to  possess  different  meanings  for  different  people.  For  ex¬ 
ample,  "this  year"  means  the  present  fiscal  year  to  some,  the  present  calendar 
year  to  others,  and  this  coming  year  to  still  others.  _  It  is  recommended  that 
the  use  of  these  problem  words  or  phrases  be  avoided  in  the  interests  of 
clarity.  In  addition,  it  is  recommended  that  question  lengths  not  exceed  25 
words  because  respondents*  comprehension  has  been  found  to  fall  off  around  that 
point  (Payne  1951). 

As  mentioned  earlier,  the  split  ballot  techniques  can  be  used  to  detect  or 
counteract  the  effect  of  phrasing  and  ordinality.  For  example,  response  options 
can  be  placed  first  or  last  in  half  the  surveys  and  in  the  middle  in  the  other 
half  to  counter  the  effect  of  ordinality. 

If  the  pilot  test  of  the  survey  indicated  that  prestige  was  on  issue  on 
some  questions,  then  face-saving  wordings  can  be  used  to  obtain  a  better  repre¬ 
sentation  of  peoples'  opinions.  Generally,  admission  of  ignorance  involves  the 
loss  of  prestige,  so  the  "Don't  know"  option  should  be  carefully  worded.  No 
set  opinion  at  this  time"  is  an  example  of  a  face-saving  wording. 

The  presence  and  placement  of  definitions  is  another  technique  which  can  be 
employed  to  reduce  the  occurrence  of  human  factors,  in  this  case,  inconsistency. 
Definitions  include  descriptions  of  the  phenomena,  the  time  frame  in  which  the 
respondent  is  to  consider  these,  and  the  scale  in  which  he  is  to  respond.  As  an 
individual  progresses  through  a  survey,  the  definitions  becomes  blurred  in  his 
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.  .  ■  Ha  ppliM  an  his  memory  of  these  definitions  and  often  arrives  at  a  work, 
mind.  He  reiies  on  ms  me  j  original  written  one.  For  this  reason, 

ing  definition  which  devi a  ^  question  or  they  should  immediately 

definitions  should  be  inootp  -  nrohability  that  the  motor  generator  will 
proceed  it.  For  example,  "Whi “UVlSo Cn?  of  t me  by  calendar  year  Septamoer  1, 
Si;?.-  mentioned  as  part  of  the 

question.  Jhe  sa^e^treatment  ^^^g-^fthes^descriptors?  "nearly  certain", 

"highly^probably" ,  and  -We  are  ,t0 oJ^efiSi ti oSs^aVe  uled'in  IttSn Iz 

S  S.  they  give  the  same  rating. 

Another  structuring  technique .  ^.rchlc.Uy  -S-i  zi  ng  ^he  eurv^is 

helpful  m  co'1"t,er’n9n^ae „iri no  the  survey  in  a  hierarchical:  manner  generally 
timism  (Meyer  1982a).  Organizing  ,  .  progressing.  to  more  Inclusive 

entails  beginning I  with  sPacitlc as\U^‘,mj0r  questions  until  his  memory  has 
questions.  The  respondent  is  no  lust  the  easily  accessible  information. 

S.  »  °1  W ht.  * 

Sh6wnn31nththe  Almlnac  example,  to  counter  peoples',  tendency 
toward  overoptimism. 

The  Interactive  Group  Method 

Detection  of  Human  Factors 

The  effects  of  phrasing, 

can  be  detected  during  elicitation  sessio  y  ^  presence  of  these 

monitoring  this  process  {Meyer  198Zb).  Gener  lly,  a  \y  This  mode  0f 

'gJoir-^ers "3  v^rb^l  i  mi"on "?  Shii  ?  though*.  will  be  given  in  the  next 
section.) 

The  respondent's  verbal  feedback  on  their  interpretations  oi |  ^?‘l0"ns 

lows  misunderstandings  to  be  caught  ^“r1in®LJ d„ 'c ont  1  nuously  holds  to  his 
be  detected  during  the  session.  If  an  ir ’dl  vi, d ual  c0"t’"“o;\1f  0p0()rtunUy  to 
initial  judgment  even  though  there  ha been  a  discu £  ^  Inconsi stency 

revise  his  judgment,  he  is  a  likely  c^aioace  they  did  a  com- 

can  be  detected  when  mincers  [ff^rorTtSion  of  a  deI,nuion  appears  to 
parable  one  earlier  or  when  their  interpretation  or  a 

change. 

Tha  problem  of  1 ^^f Ife^to-f ac^ntlr™ ew^T  the" ma?1  survey!  '§U\Tte- 
group  method  than  in  the  face-to  whereas  the  others  tend  to  be  one- 

?*U5e  h«1  s9r°U^ thus  lit h'the  usuaT  group  method,  .there  is  the  chance  of  the 
members  forgetting  information,  instruction,  and  den nitions  over  t  ® J°“pse 
time.  One  inconsistency  which  can  emerge  is  the  ease  with  wmcn .  *  r  i  h 

tion  is  applied.  For  example  the  respondent  ed”ri™  a  sesston 

scale  with  varying  frequency  through  time.  .g  *  perhaps  because  people 

seems  to  contribute  to  the  occurrence  of  inconsistencies,  pernaH 
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are  not  thinking  as  carefully.  (Fatigue  is  indicated  by  briefer  responses  and 
by  the  degree  of  the  participants'  horizontal  inclination.) 

The  degree  of  inconsistency  can  be  detected  by  use  of  Bayesian-based  scor¬ 
ing  and  ranking  techniques.  The  group  members'  judgments  can  be  entered  into  a 
scoring  and  ranking  program,  such  as  that  of  Saaty's  Analytical  Hierarchical 
Process,  to  obtain  a  rating  of  their  consistency  (Saaty  1980). 

Social  pressures  can  also  be  detected  by  real-time  observations. 
Generally,  if  consensus  is  easily  obtained,  no  difference  of  opinion  is  voiced, 
and  the  group  members  appear  to  defer  to  another  member  of  the  group,  grouo 
think  is  a  strong  possibility.  Social  pressures  can  come  from  the  members  of 
the  grouo  or  from  the  institution  sponsoring  the  decision  session.  The  institu¬ 
tion  may  favor  a  particular  decision  outcome  and  apply  pressure  on  the  grouo 
members  to  this  end. 

Structuring  the  Method  to  Reduce  the  Occurrence  of  Human  Factors 

The  first  stage  of  the  interactive  group  method,  a  free  association  exer¬ 
cise,  can  be  used  to  counteract  the  members'  tendency  toward  conservatism.  The 
free  association  exercise  involves  having  group  members  mention  any  snd  all  ele¬ 
ments  which  might  have  bearing  on  the  phenomena  in  question.  For  example,  in 
considering  a  problem  on  which  technologies  should  be  exported  from  the  United 
States,  some  of  the  major  elements  a  free  association  might  have  produced  would 
be  the  military,  economic,  political,  and  technological  significance  of  the  ex¬ 
port  items.  The  elements  mentioned  during  a  free  association  are  usually 
recorded  for  the  group  members  to  see.  Later,  the  group  members  will  work  from 
these  in  developing  a  model  of  the  decision  situation.  The  purpose  of  the  free 
association  exercise  is  to  start  with  a  wide  set  of  possibilities  and  to  narrow 
these  to  the  pertinent  ones.  The  free  association  exercise  is  to  counter  the 
human  tendency  to  anchor  narrowly  on  past  or  present  cases  which  may  not  hold  in 
the  future. 

The  next  stage,  the  organization  of  these  elements  into  a  model,  has  bear¬ 
ing  on  how  much  inconsistency  will  be  observed  when  the  members  are  giving  their 
judgments.  Highly  inconsistent  judgments  (as  determined  by  ear  and  by  Bayesian 
techniques)  often  indicate  a  need  to  restructure  the  model  to  better,  represent 
the  members'  view.  This  stage  of  the  method  is  the  most  time  consuming  because 
the  particpants  are  not  always  conscious  of  how  they  mentally  model  the 
phenomena.  Then  too,  sometimes  they  are  so  conscious  of  some  information  that 
they  fail  to  convey  it  for  incorporation  into  the  model. 

The  elicitation  phase  can  be  structured  to  include  various  techniques  for 
countering  the  effects  of  social  pressure,  conservatism,  and  overoptimi sm. 
Perhaps,  the  most  critical  of  all  of  the  structures  placed  on  the  elicitation 
process  is  the  requirement  that  participants  verbalized  their  judgments  and 
their  reasons  for  giving  such  judgments.  As  mentioned  earlier,  this  verbal 
feedback  allows  the  method  to  be  monitored  for  the  intrusion  of  many  human 
factors.  For  example,  if  group  members  appeared  to  exhibit  group  think,  the 
method  can  be  structured  to  promote  the  opposite  bias,  conservatism.  Groups 
where  conformity  is  likely  to  be  a  problem  are  cohesive  groups,  groups  where  the 
people  have  worked  togeather  before,  or  groups  where  there  is  a  dominating 
leader  (Jam's  1972).  By  requiring  group  members  to  write  -down  and  then  report 
on  their  judgments  and  rationale,  they  are  more  likely  to  get  attached  to  their 
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judgments  and  defend  them  when  the  discussion  begins*  I  would  recommend  having 
each  person  record  and  read  his  judgments  before  opening  the  floor  to  di  s  - 
cusssion  and  allowing  people  to  modify  their  judgements*  If  there  is  a  strong 
official  or  even  a  natural  exoffio  leader  in  the  group,  that  individual  should 
be  asked  to  give  his  judgments  last  so  as  not  to  influence  the  other  group 
members*  In  addition,  if  there  is  an  official  leader  of  the  group,  he  or  she 
should  be  encouraged  to  be  nondirective  during  the  meetings*  An  explanation  of 
the  group  think  phenomena  usually  suffices  to  convince  them  that  better  dis¬ 
cussions  and  data  will  result  from  their  refraining  from  "leading.'' 

If  on  the  other  hand,  group  members  appear  to  be  too  narrow,  or  anchoring, 
in  their  thinking,  a  series  of  extreme  scenarios  can  be  introduced  for  their 
consideration. 

If  overopti mism  has  been  detected,  the  group  members  can  be  lead  to  think 
in  greater  detail  about  the  elements  of  the  phenomena.  This  Is  done  in  much  the 
way  that  the  Almanac  questions  were  disaggregated  for  the  survey  population. 

Another  technique,  the  reviewing  of  definitions,  can  help  reduce  respon¬ 
dents'  tendency  to  be  inconsistent  because  of  faulty  memory.  If  at  the 
beginning  of  every  session,  definitions  are  verbally  reviewed,  membprs  will  be 
more  consistent  in  their  definitions  through  time  and  between  themselves.  In 
addition,  each  time  that  their  judgment  is  requested,  a  statement  of  the  ques¬ 
tion  Inclusive  of  definitions,  can  be  given.  For  example,  "What  rating  would 
you  give  to  the  importance  of  element  X  over  Y  to  reaching  goal  2?"  Their  copy 
of  the  scale,  in  this  case  a  Saaty  Pairwise  Comparison,  should  include  descrip¬ 
tors  or  definitions  of  the  ratings. 

Another  technique  for  reducing  Inconsistency  is  to  have  the  group  members 
monitor  their  own  consistency.  For  this  task,  they  should  have  copies  in  front 
of  them  of  their  judgments,  and  response  scale.  A  matrix  structure  of  the 
critieria  on  which  the  elements  are  being  judged,  the  elements,  and  the  judg¬ 
ments  work  well  for  this  task  (Meyer  1982b).  Often  the  group  members  will  view 
an  element  in  a  different  light  than  they  did  earlier  and  wish  to  change  the 
earlier  judgment  to  be  in  line  with  their  current  thinking.  If  their  reasoning 
does  not  violate  the  logic  of  the  model  or  of  the  definitions,  they  should  be 
allowed  to  make  the  change.  Sometimes,  consideration  of  a  new  element  makes 
them  aware  that  the  model  and  accompanying  definitions  did  not  realistically 
protray  this  part  of  the  phenomena.  Parts  of  the  original  model  will  need  to  be 
changed  and  some  of  the  process  of  giving  judgments  will  need  to  be  repeated. 
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1.  INTRODUCTION 

Modern  Army  weapon  systems  tend  to  be  sophisticated,  complex,  and 
expensive.  The  complexity  and  sophistication  are  necessary  to  meet  the 
projected  threat  and  lead  to  the  high  cost  of  both  development  and  procure¬ 
ment.  There  is  also  typically  an  urgency  to  field  the  new,  more  capable 
equipment  as  soon  as  possible.  Because  of  this  urgency,  the  Army  has  adopted 
the  Single  Integrated  Development  Test  Policy  wherein  government,  as  well  as 
contractor,  testing  is  utilized  to  find  problems  and  determine  the  effective¬ 
ness  of  corrective  actions. 

The  Army  acquisition  process  recognizes  that  most  weapon  systems  are 
not  mature  when  subjected  to  government  tests  by  allowing  for  reliability 
growth  throughout  the  development  phase.  Before  proceeding  into  the  production 
phase,  however,  there  is  a  requirement  to  demonstrate  that  the  materiel  has 
achieved  the  reliability  threshold  established.  Ideally,  this  demonstration 
is  accomplished  by  sufficient  testing  of  the  final  configuration  to  provide 
statistically  valid  estimates.  Experience  has  shown  that  programs  which 
rely  on  this  technique  generally  do  not  achieve  the  reliability  objectives 
within  the  allocated  resources  and  time.  The  second  best  alternative  is  to 
desi gn  the  tests  in  a  test-fix-test  fashion  that  hllows  for  tracking  of 
reliability  by  using  accepted  and  proven  self-purging  reliability  growth 
methodology,  such  as  the  AMSAA  model.  This  technique  has  the  advantage  of 
using  all  test  data,  thus  increasing  the  applicable  sample  size  over  the  first 
alternative,  and  is  successfully  used  by  AMSAA  in  the  reliability  evaluation 
of  many  Army  weapon  systems.  This  technique,  in  fact,  is  the  preferred 
technique  for  assessing  reliability  at  any  point  in  the  development  cycle. 

The  ability  to  use  this  technique,  however,  is  contingent  upon  several 
factors,  one  of  which  is  a  requirement  to  implement  the  corrective  action  in 
a  timely  manner  on  the  test  samples.  Unfortunately,  it  is  not  always  possible 
to  meet  the  conditions  necessary  to  use  the  AMSAA  Reliability  Growth  Model,  or 
a  similar  model,  due  to  the  time  and  money  constraints  previously  discussed; 
such  was  the  case  for  the  Ml  Abrams  tank  during  its  Full  Scale  Engineering 
Development  Phase.  In  such  cases,  alternate  methods  must  be  used  to  provide 
credible  estimates  of  the  reliability  of  the  final  design  at  the  end  of 
development. 

This  paper  descsribes  the  process  used  to  assess  the  reliability  of  the 
Ml  Abrams  tank,  and  provides  comparisons  of  these  estimates  to  estimates 
obtained  from  later  tests  of  the  same  configuration.  Further,  lessons  learned 
during  this  evaluation  are  presented  along  with  a  brief  description  of  improved 

and  formalized  procedures  developed  by  AMSAA  in  response  to  these  lessons 
learned. 
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2.  Ml  RELIABILITY  ASSESSMENT 


The  Ml  Abrams  tank  had  a  combat  mission  reliability  requirement  of  320 
Mean  Miles  Between  Failure  (MMBF ) ,  to  be  demonstrated  during  the  Initial 

Production  phase  of  the  acquisition  cycle.  Recognizing  that  corrective  actions 
for  many  of  the  design  faults  detected  during  development  test  would  not  be 
implemented  until  after  test  was  complete,  a  threshold  of  272  MMBF  was  imposed 
on  the  system  to  be  demonstrated  at  the  completion  of  the  Full  Scale  Engineering 
Development  (FSED) .  Early  in  the  FSED  testing,  it  became  apparent  that  the 
initial  design  possessed  a  reliability  much  less  than  that  necessary  to  progress 
into  production.  With  approximately  forty  percent  of  the  FSED  testing  complete, 
the  tank  was  demonstrating  an  "as-tested"  MMBF  of  120.  "As-tested"  MMBF  was 
computed,  assuming  an  exponential  distribution,  by  dividing  the  total  test 
miles  by  the  total  number  of  failures.  At  that  point  in  time,  although  failure 
analyses  had  been  conducted,  very  few  proposed  design  changes  had  resulted  in 
hardware  changes  on  the  test  samples.  In  fact,  due  to  the  desire  to  implement 
corrective  action  on  the  test  samples  as  soon  as  possible,  some  of  the  changes 
to  the  tank  hardware  had  actually  resulted  in  an  increase  in  total  system 
failure  rate  and  had  to  be  removed.  All  attempts  to  fit  reliability  growth 
tracking  curves  were  unsuccessful.  Since  an  Army  decision  review  was  scheduled 
shortly,  an  alternate  method  had  to  be  considered  to  assess  any  growth  in 
design  reliability,  and  to  further  assess  the  potential  reliability  considering 
proposed,  as  well  as  implemented,  design  changes. 

To  provide  a  continuing  assessment  of  the  Ml  Abrams  tank  reliability,  it  was 
decided  to  conduct  periodic  Reliability  Assessment  Conferences  as  authorized  by 
AR  702-3.  This  conference,  composed  of  representatives  of  the  materiel  developer, 
combat  developer,  development  test  independent  evaluator  and  operational  evalu¬ 
ator,  was  charged  with  the  responsibility  of  estimating  the  reliability  of  the 
current  configuration  and  to  project  the  reliability  when  all  identified,  but  not 
implemented,  corrective  actions  were  taken.  In  order  to  accomplish  this  mission, 
procedures  were  developed  and  agreed  to  by  the  conference  principals. 

2.1  Procedures  for  Estimating  "Demonstrated"  Reliability 

The  term  "demonstrated"  reliability  as  used  in  current  Army  Regulations  has 
been  shortened  from  what  the  Ml  Assessment  Conference  termed  "reliability  adjusted 
for  demonstrated  corrective  action."  Failure  rate  adjustment  for  this  estimate 
is  made  only  if  there  is  clear  evidence,  from  representatl ve  testing,  that  a 
reduction  in  failure  rate  has  in  fact  taken  place.  The  following  procedure 
was  used  by  the  assessment  conference  to  estimate  "demonstrated"  reliability: 

°  Establish  that  design  change  has  been  subjected  to  representative  test. 

0  Determine  that  design  change  had  positive  effect  on  reliability. 

°  Estimate  effectiveness  of  corrective  action. 

°  Adjust  failure  rates  and  compute  adjusted  reliability. 
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2.2  Procedures  for  Estimating  Projected  Reliability 

The  projected  reliability  estimate  allows  for  adjustment  of  failure  rates 
for  proposed  as  well  as  demonstrated  design  changes.  As  allowed  for  in 
AR  702-3,  the  combat  developer  and  operational  evaluator  chose  not  to  parti¬ 
cipate  in  this  projection,  other  than  offer  opinions  during  discussion.  Thus, 
for  the  Ml  program,  projections  were  made  by  AMSAA  and  the  Ml  Program  Manager's 
Office  using  the  following  procedures: 

Adjust  failure  rates  for  demonstrated  corrective  actions  in  accordance 
with  procedures  outlined  in  paragraph  2.1. 

°  Using  engineering  judgement  and  experience  with  similar  systems, 
estimate  whether  or  not  proposed  change  will  decrease  failure  rate. 

Using  engineering  judgement  and  experiences  with  similar  systems, 
estimate  effectiveness  of  proposed  modifications. 

Adjust  failure  rate  and  compute  projected  reliability. 

It  is  evident  from  the  agreed  to  procedures  that  significant  judgement  was 
inherent  in  estimation  of  both  the  demonstrated  and  projected  reliability. 

In  order  to  maximize  the  information  available  to  make  this  judgement,  a 
requirement  was  placed  on  the  prime  contractor  to  prepare  and  provide  a 

document  to  the  assessment  conference  principals  at  least  two  weeks  prior 
to  the  conference  detailing: 

0  Results  of  failure  analyses 

Results  of  all  testing  (before  and  after  corrective  action).  If  testing 
was  other  than  on  test  samples,  the  contractor  was  required  to  detail  con¬ 
ditions  of  test. 

°  Proposed  effectiveness  factor  and  rationale. 

Upon  receipt  of  the  contractor  documentation,  the  AMSAA  RAM  analyst  would 
provide  the  information,  without  the  contractor's  effectiveness  estimates, 
to  engineers  with  experience  in  the  area  of  interest  and  ask  the  following 
questions: 

°  Based  on  the  contractor  presentation,  is  there  evidence  that  design 
change  will  result  in  lower  failure  rate? 

0  What  is  your  estimate  of  the  effectiveness  of  the  corrective  action, 
expressed  in  terms  of  reduction  in  failure  rate?  Provide  rationale. 

Could  correction  of  this  failure  mode  result  in  other  failure  modes? 

What,  in  your  opinion,  is  the  most  likely  failure  mode  and  frequency? 

This  package  would  normally  be  reviewed  by  three  engineers  independently. 

The  RAM  analyst  would  assimilate  the  responses;  if  in  close  agreement,  the 
responses  would  be  accepted  as  appropriate;  if  not  in  close  agreement,  the 
analyst  would  discuss  the  differences  with  each  engineer  until  the  differences 
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were  completely  understood  or  a  consensus  was  reached.  The  analysts  would 
then  discuss  the  results  with  his  supervisor  and  they  would  jointly  agree  to 
a  position  for  the  conference.  This  modified  de 1  phi  approach  resulted  in  a 
range  of  effectiveness  factors  and  rationale  for  discussion  at  the  assessment 
conference. 

The  assessment  conference  was  conducted  in  a  democratic  process,  with 
open  discussion  by  all  principals.  A  majority  vote  (3  of  4)  was  required  to 
consider  corrective  action  demonstrated.  If  considered  demonstrated,  the 
effectiveness  factor  was  then  agreed  to  by  voting.  Because  of  the  work  done 
at  home  station,  the  AMSAA  position  was  normally  accepted,  particularly  if 
the  estimate  was  close  to  the  estimate  provided  by  the  contractor  through 
the  Program  Manager's  Office  representati ve. 

2.3  Results  of  Ml  Assessment 

The  above  procedures  were  used  prior  to  the  Army  review  mid-way  through 
the  development  test  program.  At  that  time,  results  of  the  assessment  were 
as  follows: 


MMBF 

As  Tested 

120 

Demonstrated 

145 

Projected 

256 

The  demonstrated  estimate  was  not  vastly  different  from  the  "as-tested" 
estimate  for  two  reasons;  (1)  The  as-tested  estimate  included  some  experience 
with  corrective  actions  implemented  on  the  test  samples  and  (2)  very  few  of  the 
proposed  corrective  actions  had  been  tested.  Although  the  tank  was  demonstra¬ 
ting  reliability  well  below  the  requirement,  a  go-ahead  decision  was  granted 
based  on  a  thorough  discussion  of  the  corrective  actions  identified  and  the 
estimates  provided  by  the  assessment  conference  as  to  the  effectiveness  of 
these  corrective  actions. 

These  procedures  were  used  during  the  remainder  of  the  FSED  and  Low  Rate 
Initial  Production  test  with  the  following  results; 

Mean  Miles  Between  Failure 
As  -tested  Demon st  rate’s* 


Extended  FSED  (Phase  1) 

234 

299 

Extended  FSED  (Phase  2) 

308 

326 

Initial  Production  (1) 

278 

351 

Initial  Production  (2) 

324 

351 

(1)  Includes  Early  Production  Process  Problems 

(2)  Excludes  Early  Production  Process  Problems 

The  configuration  of  the  tank  at  the  beginning  of  the  Extended  FSED  (Phase  1) 
was  essentially  the  same  as  that  for  which  a  projected  estimate  of  256  MMBF 
was  made  for  the  Army  review.  For  all  other  phases  of  the  test  program,  the 
configuration  at  the  beginning  of  the  phase  is  essentially  the  same  as  that 
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for  which  "demonstrated"  estimates  were  made  duing  the  preceding  phase.  For 
example,  the  estimated  value  for  extended  FSED  {Phase  2)  was  299  MMBF  based 
on  Phase  1  testing;  the  actual  as-tested  value  for  Phase  2  was  308  MMBF. 

It  is  of  interest  to  note  that  the  estimated  value,  in  most  cases, 
overestimated  the  "as-tested"  estimate.  It  was  observed  that  the  greatest 
reason  for  this  was  the  occurrance  of  new  failure  modes,  in  most  part  not 
relatedto  any  corrective  action.  It  was  also  apparent  that  there  had  been 
no  provisions  in  the  estimates  to  account  for  quality  assurance  and  produc¬ 
tion  process  problems  inherent  in  the  start-up  of  a  new  production  process. 
Historically,  this  start-up  process  has  resulted  in  approximately  a  10  per¬ 
cent  reduction  in  MMBF. 

Overall,  the  process  worked  well.  Even  with  the  recognized  problems, 
the  estimates  obtained  using  expert  opinion  were  within  the  "statistical 
noise"  of  the  estimates  obtained  from  further  testing  of  the  same  configuration. 

3.  LESSONS  LEARNED 

Although  the  estimates  obtained  by  using  the  procedures  discussed  were 
very  close  to  values  actually  demonstrated  later,  several  problems  were  noted 
with  the  procedures. 

0  There  is  typically  a  wide  variation  in  the  estimates  provided  by  experts 
on  the  effectiveness  of  proposed  corrective  action.  This  paper  will  not 
attempt  to  discuss  reasons  for  this  variation,  but  simply  note  that  it  did 
exist. 

Intuitively,  it  was  felt  that  giving  credit  for  corrective  action  taken 
for  low  failure  rate  modes  resulted  in  an  optimistic  estimate  of  reliability. 

0  The  assessment  conference  procedure  allows  for  control  of  the  conference 
by  the  "strong"  individual  (most  persuasive),  not  necessarily  the  one  with  the 
most  knowledge.  Estimates  arrived  at  by  the  conference  may  thus  not  have  the 
benefit  of  the  representative  input  of  all  experts. 

On  the  positive  side,  the  Ml  experience  demonstrated  that  credible 
estimates  can  be  made  using  expert  opinion,  and  that  low  risk  decisions  can 
be  made  in  a  timely  manner  without  the  requirement  to  test  the  final  configu¬ 
ration  for  prolonged  periods. 

The  contractors  (prime  and  subs)  possess  the  greatest  expertise  for 
the  particular  design.  Contracts  must  be  written  to  take  advantage  of  this 
expertise,  and  in  such  a  manner  to  allow  for  significant  government  inter¬ 
action,  to  include  the  independent  evaluators.  A  conscientious  effort  is 
required  by  the ' government  community,  to  Include  use  of  Government  laboratories 
and  independent  consultants,  to  properly  assess  corrective  actions. 

4.  IMPROVEMENT  IN  PROCEDURES 

The  two  areas  of  greatest  concern  that  evolved  from  the  Ml  assessment  was 
the  uncertainty  of  the  fix  effectiveness' estimates,  particularly  for  the 
projected  reliability  estimates,  and  the  realization  that  projections  were 
probably  optimistic  because  of  giving  credit  for  corrective  actions  for  low 
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failure  rate  failure  modes  without  considering  the  effect  of  other  unseen 
failure  modes.  Discussions  of  these  perceptions  with  personnel  from  the 
AMSAA  RAM  Methodology  Office  resulted  in  further  investigatin  of  the  perceived 
problems  and  publishing  of  several  reports  to  document  improved  methodolgy  and 
procedures.  Following  is  a  brief  synopsis  of  the  published  reports  with 
comments  on  how  they  may  be  used  to  improve  future  assessments, 

4.1  AMSAA  Technical  Report  No.  357,  "An  Improved  Methodology  for 
Reliability  Growth  Projection",  Larry  H.  Crow,  June  82. 

In  this  report,  Dr.  Crow  showed  that  even  when  the  effectiveness  factors 
are  known  exactly,  the  adjusted  procedures  used  in  the  Ml  assessment  would 
still  overestimate  the  system  reliability.  He  further  was  able  to  mathe¬ 
matically  determine  the  bias  term: 

8 (T )  -  K  h(T) , 

Where  K  =  average  effectiveness  factor 

h(T)  =  average  rate  of  occurrance  at  time  t 

of  new  failure  modes  for  which  corrective 
action  will  be  taken 

Maximum  likelihood  methods  are  used  to  estimate  h(T). 

Use  of  the  procedures,  outlined  in  this  report  make  it  possible  to  provide  an 
unbiased  estimate  of  system  failure  rate.  The  uncertainty  in  the  estimate  of 
the  effectiveness  factors,  however,  remained  a  concern.  In  order  to  alleviate 
this  concern ,  research  was  conducted  on  historical  fix  effectiveness  factors 
and  documented  in  the  following  report, 

4.2  AMSAA  Technical  Report  No.  388,  "Reliability  Fix  Effectiveness  for 
Army  Systems",  Bruce  S.  Trapnel 1 ,  May  1983. 

The  purpose  of  this  report  was  to  provide  a  historical  data  base  on  fix 
effectiveness  factors  for  various  systems.  The  advantage  to  this  data  base 
is  that  it  provides  a  guide  to  what  might  be  reasonably  expected  on  similar 
systems,  serving  as  a  useful  tool  to  the  engineer  in  assignment  of  effective¬ 
ness  factors  for  projection  purposes. 

The  report  details  historical  effectiveness  factors  for  eleven  systems, 
to  Include  helicopters,  tanks,  wheeled  vehicles  and  missiles.  The  average 
demonstated  effectiveness  factor  for  all  systems  was  approximately  0.70,  with 
relatively  small  variation. 

Work  is  continuing  in  this  area  to  determine  fix  effectiveness  by  major 
subsystems,  such  as  engine,  electrical  system,  etc.  These  data,  broke  down  to 
subsystem  level .wilt  be  even  more  useful  for  projection  for  future,  more 
complex  systems. 

It  is  recognized  that  fix  effectiveness  depends  on  many  factors,  and  that 
the  past  does  not  necessarily  predict  the  future.  The  available  estimates, 
however,  will  provide  a  starting  point  and  will  force  the  expert  to  defend 
large  deviations  from  past  experience. 
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4.3  AMS A A  Technical  Report  No.  399,  "Corrective  Action 
(CART's),"  Bruce  Trapnell  and  Clarke  Fox,  July  1983. 


Review  Team, 


The  purpose  of  this  report  is  to  standardize  the  procedures  for  determining 
effectiveness  factors  and  making  projections.  It  recommends  a  procedure  which 
uses  historical  fix  effectiveness  factor  to  modify  judgmental  estimates.  It 
further  specifies  additional  data  that  must  be  collected  to  use  the  projection 
model . 


5.  CONCLUSIONS 


Estimates  of  reliability  provided  for  the  Ml  Abrams  tank  using  procedures 
outlined  in  this  paper  proved  to  be  quite  good,  as  demonstrated  in  later  testing 
To  a  large  degree,  the  author  feels  that  this  is  attributed  to  the  expertise 
of  the  engineers  and  analysts  involved  -  and  a  lot  of  luck.  The  procedures 
could  be  greatly  enhanced  by  use  of  available  historical  fix  effectiveness 
factors  and  the  projection  methodology  developed  by  AMSAA.  There  will  con¬ 
tinue,  however,  to  be  situations  where  expert  opinion  will  be  the  prime  imput 
to  analyses  and  decisions.  It  is  thus  of  paramount  importance  to  continue  to 
develop  experts  and  methodology  to  best  use  expert  opinion. 
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ABSTRACT 

The  Civil  Service  Reform  Act  of  1978  mandates  performance-based  appraisal 
systems  in  federal  agencies  and  performance  measurements  which  are  accurate 
and  objective  to  "the  maximum  extent  feasible."  In  this  paper  we  study  two 
examples  in  which  objectivity  can  be  defined  as  the  establishment  of  processes 
which  test  hypotheses  against  actual  data  and  the  evaluation  of  attendant  a 
and  b  risks.  In  the  first  example,  we  use  the  Poisson  distribution  to 
evaluate  performance  against  a  standard  for  courtesy.  This  model  requires 
that  behavior  be  directly  observed  90  percent  of  the  time  for  acceptably  low 
"rudeness  levels"  and  is  thus  impractical.  In  the  second  example,  we  propose 
using  the  binomial  distribution  to  evaluate  the  performance  of  message  center 
clerks  who  have  the  task  of  assigning  "Action/Info"  and  distributing 
correspondence  to  elements  of  a  large  organization.  In  this  case  the  amount 
of  inspection  required  is  affordable. 
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INTRODUCTION 


The  Civil  Service  Reform  Act  of  1978  (CSRA)  requires  government  agencies 
to  establish  performance-based  appraisal  systems  under  the  general  supervision 
of  the  Office  of  Personnel  Management.  In  pertinent  words  of  the  statute: 

Under  regulations  which  the  Office  of  Personnel 
Management  shall  prescribe,  each  performance  appraisal 
system  shall  provide  for  establishing  performance 
standards  which  will,  to  the  maximum  extent  feasible, 
permit  the  accurate  evaluation  of  job  performance  on 
the  basis  of  objective  criteria  (which  may  include  the 
extent  of  courtesy  demonstrated  to  the  public)  related 
to  the  job  in  question  for  each  employee  or  position 
under  the  system. 

In  compliance  with  the  CSRA,  the  Department  of  the  Army  (DA)  established 
performance-based  appraisal  systems  for  Senior  Executives  (SE),  General  Merit 
(GM)  employees,  and  General  Schedule  (GS)  and  Wage  Grade  (WG)  employees. 
Although  the  three  appraisal  systems  are  covered  by  different  regulations  and 
utilize  different  forms,  they  share  similar  structure,  vocabulary,  and 
management  philosophy  to  the  extent  that  one  may  speak  of  the  "Army  Appraisal 
System"  ( AAS ) .  Under  the  AAS,  supervisors  are  to  provide  each  employee  with  a 
written  Individual  Performance  Plan  (IPP)  at  the  beginning  of  a  rating  period. 
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IK  „  IPP,  related  Tas.cs/Act1vit1es  ate  grouped  Into  Oct  E1*n« 

titles  such  as  Personnel  Management,  Preparation 

described  by  short  mandatory  for  supervisors; 

Safe tv  etc  Some  Job  Elements  are  mandatory 
Correspondence,  ^  ^  in  gr0uping  tasks  and  naming 

otherwise,  a  great  dea  standard  which  expresses 

r  ,  Task/Activity  is  accompanied  by 

job  Elements.  Each  Additional  standards  not  keyed  to 

-  —  ^  "  IT.  "  Tent  as  a  .ole.  -  - 
spec1f,c  tests  may  be  »n  ^  ^  ^  ^  ^  ^  standards 

supervisors  usually  „  required  to  cover  a  -supervisory 

per  Job  Element.  Less  su 

position.  nuantified  whenever  possible, 

quires  that  standards  be  quantin 

System  doctrine  req  ovide  employee  an 

•  acceptable  performance,  and  provi 

express  a  range  o  This  d0ctrine  may  be 

opportunity  to  excel  »  Indards  provided  sucn  standards 

breached  by  the  ^  Absolute  standards  *  he  used  in  situations 

are  not  an  abuse  of  breach  of  security,  or  great 

where  a  single  failure  could  cause  ea  ,  ^flight  checks 

,  Thlls  a  standard  my  require  a  pilot  to  mice  pr 

monetary  loss.  •  allow1„g  no  typing  errors  would  be  an 

100  percent  of  the  time,  but  a  stan 

abuse  of  discretion.  the  IPP,  the  rating 

u  ri  nf  the  performance  period  covereo 

of  perforM„ce  (Pi,  against 

supervisor  is  required  to  ma  e  an  (£)#  Met  (M),  or  Not  Met 

each  standard  (S^  and  make  a  judgmen  ractice  to  use  the  words 

.  t +■  1  c  common,  but  sloppy*  P 

IN)  for  each  Job  Element.  associated  S,. 

■■exceeded.”  Wand  "not  met"  in  comparing  each  P,  to 
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These  words  have  been  mentioned  (selected  as  names  of  ratings  for  entire  Job 
Elements  which  usually  contain  more  than  one  standard)  and  are  not  logically 
available  for  use  in  any  other  context/  In  order  to  avoid  confusion,  we  use 
the  separate  and  distinct  designators  Above  *  Tolerance  (A),  Within  Tolerance 
(W),  and  Below  Tolerance  (B)  for  this  comparison.  No  algorithm  for  mapping  a 
(A,W,B)  set  for  a  Job  Element  into  E,  M,  or  N  is  provided  in  the  system 

design.  It  is  indeed  within  the  purview  of  a  rating  supervisor  to  rate  an 

employee  E  or  M  on  a  Job  Element  even  though  a  specific  Pj  to  $1  comparison 

within  the  element  leads  to  a  conclusion  of  Below  Tolerance.  (A  reviewing 
official  might  require  that  such  a  supervisor  explain  his/her  decision!) 
Following  determination  of  the  (E,M,N)  set  of  ratings  of  Job  Elements,  an  0PM 
approved  algorithm  is  used  to  arrive  at  a  final  adjectival  rating  of 

Exceptional  (EX),  Highly  Successful  (HS) ,  Fully  Successful  (FS),  Minimally 
Satisfactory  (MS),  or  Unsatisfactory  (U). 

So  far  we  have  merely  provided  a  brief  description  of  the  structure  and 
vocabulary  of  the  A  AS.  The  appraisal  systems  of  other  agencies  are  quite 
similar.  In  the  remainder  of  the  paper  we  examine  the  implications  of 
attempting  to  be  objective  within  such  a  system,  objectivity  being  a  statutory 
requirement. 

In  order  to  have  specific  examples,  we  introduce  two  mathematical 
models.  In  the  first  we  propose  to  measure  courtesy  by  direct  observations. 

In  the  second  we  propose  to  measure  by  actual  sampling  the  accuracy  of  an 
Action/Info  Clerk  in  an  administrative  office  who  is  supposed  to  route 
incoming  mail  to  the  appropriate  subdivisions  of  a  large  organization.  Before 
continuing,  we  note  that  many  supervisors  write  standards  in  the  form  "No  more 
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than  N  substantiated  complaints  of  during  the  performance 

period.1'  (The  reader  may  fill  in  the  blank.)  For  the  purposes  of  this  paper 
we  eschew  shortcuts  which  allow  conclusions  in  the  absence  of  data.  Instead, 
we  require  that  actual  observations  be  used  to  test  hypotheses  and  assess  the 
attendant  risks  of  drawing  wrong  conclusions.  Since  one  purpose  of 
performance-based  appraisal  systems  is  to  provide  a  basis  for  rewarding 
employees  whose  performance  is  above  acceptable  standards,  the  difference 
between  ordinary  good  performance  and  exemplary  performance  should  be 
detectable  by  the  measurement  paradigm.  Antithetically,  less  than  acceptable 
performance  should  also  be  detectable  in  order  to  validate  corrective  action 
for  nonacceptable  performance. 
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STANDARDS  FOR  COURTEOUS  BEHAVIOR 


It  is  noted  in  the  Introduction  that  the  CSRA  specifically  mentions 
"courtesy  demonstrated  to  the  public"  as  an  evaluation  factor  in  job 
performance.  In  the  same  legislation.  Congress  has  provided  for  a  suspension 
of  only  14  days  or  less  for  four  instances  of  discourtesy  within  a  one' year 
period.2  Considerable  discussion  of  courtesy  standards  has  been  provided  by 
the  U.S.  Merit  Systems  Protection  Board  (MSPB).$  It  is  clear  from  these 
references  that  courtesy  should  not  be  the  subject  of  an  absolute  standard. 
It  may  seem  paradoxical,  but  a  level  of  rudeness  must  be  allowed  if  courtesy 
is  to  be  measured  and  rewarded.  In  our  own  review  of  IPPs,  we  note  that 
courtesy  standards  are  commonly  imposed  on  employees  in  Secretary/Receptionist 
type  positions  and  rarely  on  others.  As  a  side  comment,  this  would  appear  to 
be  unintentional  discrimination  against  incumbents  in  a  particular  job 
category. 

We  find  that  courtesy  standards  are  usually  written  in  the  "No  more  than 
N  .+  6  complaints  received"  form.  We  propose  a  standard  written  in  terms  of 
"No  more  than  N  +_  <5  incidents  of  discourtesy  allowed."  This  would  seem  to  be 
appropriate  since  most  employees  are  under  direct  observation  by  a  supervisor 
for  some  fraction  of  time.  (As  a  thought  experiment,  we  could  imagine 
employing  an  inspector  to  observe  the  employee  through  a  one-way  window  for 
whatever  fraction  of  time  is  needed  to  ensure  objectivity  in  the  sense 
intended  here.)  It  is  assumed  that  incidents  of  discourtesy  are  random, 
isolated  in  time,  uncorrelated  and  that  the  probability  of  an  incident  during 
a  time  interval  is  proportional  to  the  duration  of  the  interval.  Provided  the 
number  of  incidents  is  small,  these  assumptions  are  reasonable  and  permit  the 
use  of  the  well-known  Poisson  distribution. 
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Given  a  “rudeness  allowance"  of  N  +  8  incidents  per  year,  we  can  only 

estimate  the  actual  performance  level,  Pa,  by  hypothesis  testing.  We  seek 

mathematically  consistent  sets  of  the  following  parameters: 

F  =  Fraction  of  time  observed. 

Ra  =  Acceptance  range.  If  the  number  of  observed 

discourteous  acts  is  within  this  range,  the 
sample  supports  the  conclusion  that  the 

performance  is  within  tolerance  with  a  given 
risk  of  being  wrong. 

c(£  =  Employee's  risk  that  a  within  tolerance  or 

better  performance  will  be  rated  as  below 

tolerance. 

as  =  Supervisor's  risk  that  a  within  tolerance  or 

worse  performance  will  be  rated  as  above 

tolerance. 

=  Employee's  risk  that  above  tolerance  perfor¬ 
mance  will  not  be  detected. 

=  Supervisor's  risk  that  below  tolerance  perfor¬ 
mance  will  not  be  detected. 
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Mathematical  details  are  presented  in  appendix  I-B.  A  short  table  of 


results  follows: 

Standard 

F 

aE  “  °S 

StlPa 

"slPa 

2  +  .5 

.90 

0-3 

.25 

1.00)1 

.71 1 3 

2  +_  .5 

1.00 

1-3 

.25 

.63  1 1 

.65 1 3 

20+  5 

.25 

1-10 

.10 

.92)10 

.86)30 

200  +  50 

.25 

30-73 

.10 

.18)100 

.44  1 300 

In  the  first  line  of  the  table,  we  set  the  rudeness  level  at  2  +_  .5 
incidents  per  year.  (The  artificiality  of  setting  s  as  half  of  an  incident 
merely  facilitates  computation  in  the  small  N  regime.)  The  proposed  fraction 
of  time  observed  in  this  line  is  rather  high,  90  percent.  Then  if  the  number 
of  observed  incidents  of  discourtesy  is  in  the  range  0-3,  inclusive,  the  rater 
may  conclude  that  performance  is  within  tolerance  with  a  probability  greater 
than  aE  +  =  .5  of  having  drawn  the  wrong  conclusion.  It  might  seem  that 

if  . the  actual  number  of  incidents  of  discourtesy  is  3  against  a  standard  of 
N  =  2  _+  .5  the  performance  was  surely  out  of  tolerance.  Not  necessarily. 
When  N  _+  s  is  used  to  parameterize  the  Poisson  distribution,  it  applies  either 
to  an  ensemble  of  employees,  or  individual  behavior  over  many  performance 
periods.  Then  Pa,  the  actual  performance  for  a  given  period,  becomes  a 
stochastic  variable  and  an  observation  of  three  incidents  does  not  show  that 
N  f  2.  (Subtleties  of  interpretation  in  the  small  N  regime  disappear  for 
larger  values  of  N.)  The  next  entry  SE|Pa  =  1.00)1  is  the  probability  (1.00) 
that  a  better  performance  (N  =  1)  would  not  be  detected,  and  | Pa  =  .71)3  is 
the  probability  (.71)  that  a  worse  performance  (N  =3)  would  not  be  detected. 
The  second  line  merely  exhibits  a  decrease  in  risks  if  inspection  is  increased 
to  100  percent.  In  the  final  line,  we  decrease  inspection  time  and  lower 
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risks  by  degrading  the  standard  to  the  point  of  allowing  almost  four  incidents 
of  discourtesy  per  week.  The  dilemma  is  apparent.  Objective  validation  of 
performance  against  a  high  standard  requires  a  lot  of  inspection  time. 
Maintenance  of  the  objective  process  with  reduced  inspection  time  requires 
that  the  standard  be  degraded  to  an  unacceptable  level. 

In  the  case  of  Callaway  versus  DA,  the  MSPB  reversed  a  removal  action 
against  the  appellant  which  was  based  partially  on  failure  to  perform  in 
accordance  with  an  absolute  {N  =  0)  courtesy  standard.  Absolute  standards  are 
likely  to  be  judged  by  the  MSPB  as  an  abuse  of  agency  discretion  except  in 
"situations  where  death,  injury,  breach  of  security,  or  great  monetary  loss 
could  result  from  a  single  failure  to  meet  the  performance  standard  measuring 
performance  of  a  critical  element."  That  issue  is  quite  different  from  the 
one  addressed  here,  namely,  the  objective  measurability  of  performance  against 
a  nonabsolute  standard. 

A  standard  written  in  the  form  "No  more  than  N  _+  6  substantiated 
complaints  of  discourtesy  during  the  performance  year"  has  the  advantage  of 

being  easy  to  administer.  Such  a  standard  places  the  inspection  and  reporting 

responsibility  on  the  public  and  coworkers  rather  than  the  supervisor. 
However,  the  measurement  is  now  a  joint  property  of  employee  behavior  and 

tolerance  thresholds  of  potential  complainants.  In  practice,  few  or  no 
reports  will  actually  be  received.  Trivialized  and  easy  to  administer 
standards  lead  to  "Above  Tolerance"  decisions  in  the  absence  of  data  and 
contribute  significantly  to  rating  inflation.  Were  it  not  for  the  statutory 
status  of  courtesy  standards,  we  would  recommend  that  they  be  used  only  on  a 
management  by  exception  basis  and  not  ordinarily  included  in  IPPs.  The 
question  of  whether  or  not  the  adoption  of  this  policy  would  violate  the 

intent  of  Congress  is  debatable. 
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STANDARDS  FOR  A  MESSAGE  FORWARDING  CLERICAL  FUNCTION 


The  task  in  this  example  is  that  of  sorting  a  large  volume  of  incoming 
messages,  assigning  "Action/Info"  to  each,  and  distributing  the  messages  to 
appropriate  elements  of  a  large  organization.  While  many  actions  are  purely 
routine,  others  require  an  appreciation  of  message  content  and  knowledge  of 
the  mission  and  functions  of  organizational  elements.  We  assume  that  the 
workload  is  sufficiently  large  to  allow  use  of  the  binomial  distribution  to 
describe  sampling  without  replacement.  (The  Message  Center  at  White  Sands 
Missile  Range  processes  about  50,000  such  actions  per  year.  The  function  is 
performed  by  three  to  four  employees  who  also  have  other  duties.)  We  further 
neglect  the  fact  that  "Action'1  errors  are  usually  more  serious  than  "Info" 
errors.  Performance  standards  for  the  employees  are  assumed  to  be  in  the  form 
"P  i  6  percent  of  Action/Info  determinations  are  correct."  A  sample  of  size  n 
is  to  be  drawn  at  random  for  inspection  during  the  performance  year.  It  is 
assumed  that  the  inspecting  supervisor's  determination  of  "correct"  or 
"incorrect"  on  each  sample  element  is  error  free.  Pa,  the  actual  performance 
to  be  estimated,  is  expressed  as  a  percentage.  Ra  is  the  observed  range  of 
correct  actions  within  a  given  sample  of  size  n  that  allows  acceptance  of  the 
hypothesis  that  performance  is  within  tolerance  with  risks  as  defined 
previously.  Mathematical  details  are  presented  in  appendix  I-C.  As  with  the 
previous  example,  we  exhibit  a  short  table  of  results. 


Standard 

n 

Ra 

tr 

it 

®ElPa 

%lpa 

85  +_  5 % 

100 

76-93 

.15 

. 23  1 95 

.46  |  75 

94  +_  2% 

100 

89-98 

.15 

.60 | 98 

.70 | 90 

94  '+_  2% 

500 

450-487 

.05 

.21 | 98 

.54 | 90 

94  +_  2% 

1000 

906-970 

.05 

.02 | 98 

.28 | 90 

94  +_  2% 

1500 

1362-1452 

.05 

.001 1 98 

. 16 | 90 

94  2% 

2000 

1820-1934 

.05 

.0001 | 98 

o 

to 

o 
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In  the  data  selected  for  presentation,  we  begin  with  a  pedestrian  level  of 
performance,  85^5  percent,  a  small  sample  size,  n  -  100,  and  exhibit  rather 
high  risks.  As  would  be  expected,  the  second  line  shows  that  escalating  the 
standard  and  keeping  n  =  100  increases  the  risks.  In  the  remainder  of  the 
table  we  maintain  a  high  standard  and  keep  increasing  n  in  order  to  decrease 

»E I pa  and  8slpa- 

We  searched  for  a  sample  size  and  risks  of  about  10  percent  or  less  as 
exhibited  in  the  last  line  of  the  table.  An  interesting  feature  of  the 
results  is  that  for  fixed  p  +  «  and  cE  -  «s,  SE|Pa  decreases  much  faster  than 
„s | pa  as  „  increases.  Balanced  risks  of  about  10  percent  across  the  board  are 
not  inherent  in  the  model.  At  the  sampling  level  of  n  =  2,000  the  risk  of 
being  unfair  to  the  employee  is  negligible.  We  speculate  that  competent, 
self-confident  employees  would  resent  increased  inspection,  although  analysis 
shows  that  it  would  be  in  their  best  Interest.  It  should  also  be  noted  that 
Ra  is  wider  in  every  case  than  the  nominal  range  of  Hi  (expressed  as 
decimal  fractions)  times  n.  This  is  to  be  expected  in  a  stochastic  model; 
observations  outside  the  nominal  range  do  not  necessarily  Indicate  an  out  of 
tolerance  condition.  This  is  not  generally  understood  by  supervisors. 

Should  it  turn  out  that  the  number  of  correct  Action/Info  determinations 
in  the  sample  of  n  -  2,000  is  more  than  the  top  of  the  range,  namely  1,934, 
that  fact  along  with  performance  against  other  standards  in  the  employee's  IPP 
should  be  an  evaluation  factor  in  considering  the  employee  for  a  performance 
award.  Similarly,  an  observed  number  of  correct  determinations  below  the 
bottom  of  the  range,  1,820,  indicates  a  need  for  corrective  action.  If  the 
scheme  is  applied  to  each  of  three  employees,  the  total  sample  is  n  -  6,000, 
about  12  percent  of  workload.  The  standard  of  94  +  2  Percent  is  high  enough 
to  represent  a  good  operation,  yet  low  enough  to  allow  employees  an 
opportunity  to  excel.  The  amount  of  inspection  is  affordable  and  the  paradigm 


is  objective. 
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There  is  a  more  sophisticated  procedure  for  making  A,  W,  or  B  decisions 
than  that  given  above.  Depending  on  the  data,  these  decisions  may  be 
classified  as  strong  or  weak.  The  supervisor  may  wish  to  give  the  employee 
the  benefit  of  any  doubt  and  escalate  a  decision  from  B  to  W  or  from  W  to  A, 
or  gain  further  confidence  that  a  B  decision  justifies  corrective  action.  The 

4  5 

basic  theory  can  be  found  in  the  literature  of  statistics  *  and  an  example  is 
provided  in  appendix  II. 

A  standard  relating  to  filing  errors  was  a  second  issue  in  the  case  of 

3 

Callaway  versus  DA.  No  more  than  two  filing  errors  were  allowed  during  an 
"annual  files  inspection."  Errors  were  found  during  an  inspection  in 
preparation  for  the  "1982  Annual  General  Inspection";  and  the  agency  claimed 
that  the  performance  standard  applied  to  any  inspection.  The  MSPB  thought 
otherwise  and  found  in  favor  of  plaintiff  on  this  count.  One  lesson  from  this 
case  is  that  inspection  related  to  performance-based  appraisal  systems  should 
be  defined  in  terms  of  on-going  processes  for  monitoring  performance  rather 
than  scheduled  general  inspections.  Moreover,  if  we  may  speculate  that  the 
filing  workload  in  this  case  was  high  enough  to  allow  an  analytical  model  such 
as  the  one  used  in  this  example,  then  the  standard  itself  was  faulty.  It 
should  have  been  expressed  as  a  percentage  of  allowed  incorrect  actions  with  a 
range,  set  high  enough  to  allow  a  good  operation  yet  low  enough  to  provide  the 
employee  an  opportunity  to  excel,  and  monitored  by  an  objective  process.  As 
did  the  MSPB,  we  would  find  in  favor  of  plaintiff,  but  with  different 
reasoning. 

6 

There  is  another  case,  that  of  Walker  versus  Treasury,  in  which  the 
techniques  of  this  paper  can  be  applied  in  a  critique.  Walker's  task  was 
specifically  that  considered  here,  namely  distribution  of  correspondence. 
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The  agency  had  been  operating  with  an  86  percent  accuracy  standard.  It 

changed  from  this  rate-type  standard  to  a  number  of  error s-type  standard  which 
translates  back  to  a  rate  standard  of  99.5  +_  .2  percent.  Appellant  was 
allowed  0-3  errors  per  month  on  a  workload  of  about  500  pieces  of 
correspondence.  She  in  fact  averaged  about  9  errors  per  month,  committed 
10  errors  during  a  1-month  probationary  period,  and  was  removed  from  her 
position.  Among  other  things,  she  claimed  that  the  new  standard  was 

unreasonably  high.  The  agency  claimed  that  other  employees  were  able  to 

achieve  the  standard,  but  did  not  present  convincing  evidence  of  this  claim  to 
the  MSPB.  In  critiquing  this  case,  we  have  two  findings:  (1)  The  new 

standard  provided  no  opportunity  for  any  employee  to  excel.  As  shown  in 
appendix  I,  validation  of  an  above  tolerance  performance  would  require 
observation  of  a  negative  number  of  errors,  an  impossibility.  (2)  Had  the 
agency  wished  to  document  achievabili ty,  the  table  in  appendix  I  shows  that 
the  sample  size  would  have  had  to  be  larger  than  the  workload,  another 
impossibility.  Our  analysis  is  supportive  of  the  MSPB  decision  to  order  the 
reinstatement  of  Walker  to  her  position. 
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COMMENTS  AND  CONCLUSIONS 


The  Civil  Service  Reform  Act  of  1978  mandates  performance-based  appraisal 
systems  and  performance  measurement  which  is  objective  and  accurate  to  the 
"maximum  extent  feasible."  It  is  appropriate,  therefore,  to  systematically 
investigate  the  extent  to  which  performance  measurement  can  be  made  objective 
and  accurate. 

In  our  exploration  of  this  issue,  we  have  chosen  examples  in  which 
objectivity  can  be  defined  in  terms  of  processes  which  use  actual  data  to  test 
hypotheses  and  evaluate  related  a  and  3  risks.  This  definition  of  objectivity 
is  a  standard  tool  in  all  of  measurement  science.  However,  in  establishing 
objective  processes  one  also  must  consider  the  cost  of  inspection  in  time  or 
money.  On  this  basis,  the  model  for  validating  performance  against  courtesy 
standards  must  be  judged  impractical,  whereas  the  model  for  evaluating  the 
work  of  "Action/Info"  clerks  in  a  message  center  appears  to  be  worthy  of 
adoption. 

The  analytical  approach  used  in  this  paper  is  not  applicable  in  many 
cases.  Some  standards  are  inherently  easy  to  administer.  For  example,  a 
"Timeliness"  standard  requires  very  little  inspection  time,  it  being  easy  to 
determine  whether  or  not  a  piece  of  work  is  rendered  on  time.  Most  per¬ 
formance  standards  of  managers  and  executives  are  stated  in  terms  of 
organizational  objectives,  do  not  involve  repetitive  tasks,  and  are  not 
amenable  to  statistical  treatment.  However,  the  basic  tension  between 
objectivity  and  inspection  time  can  never  be  avoided.  In  this  regard,  one 
must  also  consider  the  total  number  of  standards  to  be  monitored  by  a  single 
supervisor.  For  example,  consider  a  GM-14  who  rates  three  GM-13$  and  two 
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nonsupervi sory  personnel.  Job  analysis  and  the  structuring  of  IPPs  in 
accordance  with  the  "school  solution"  will,  in  this  case,  generate  about 
150  performance  standards.  Some  of  these  will  be  easy  to  administer,  some 
will  not.  Some  will  be  amenable  to  hypothesis  testing,  many  will  not.  In  any 
case,  it  is  clear  that  effective  use  of  performance-based  appraisal  systems 
requires  orderly  planning  of  inspection  time. 

Hypothesis  testing  should  be  used  in  those  cases  where  analysis  shows  it 
to  be  feasible.  Any  lesser  definition  of  "objectivity"  in  such  cases  would  be 
indefensible. 
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Appendix  I:  APPLICABLE  HYPOTHESIS  TESTING 


A.  Background. 

s 

Hypothesis  testing  is  a  widely  used,  well  documented  method  for  comparing 
a  parameter,  0,  with  a  standard,  e0.  The  basic  procedure  is  to  assume  a  null 
hypothesis,  Hq,  and  reject  Hq  only  if  there  is  sufficient  experimental 
evidence  that  the  assumption  is  unlikely.  The  significance  level,  called  the 
Type  I  risk  and  denoted  by  a,  is  the  minimum  acceptable  likelihood  that  the 
exprimental  data  could  be  obtained  if  Hq  is  true.  An  alternate  hypothesis, 
Ha,  is  for  use  if  Hq  is  rejected. 

The  straight-forward  hypotheses  for  performance  appraisal  would  be 
H0:  eL  1.  9  Ji  0U  <=~=>  Within  Tolerance  (W)  and 
Ha:  0  <  °L  <===>  Above  Tolerance  (A)  or 
0  >  0j  <===>  Below  Tolerance  (B) 

where  o0  is  replaced  by  a  tolerance  range  oL  t0  °u*  The  TyPe  1  risk  would  be 
ot  =  p  [  Rating  0  <  or  0  >  0q  I  °L  —  0  —  0U  ^  =  p  ^  Ratin9  A  or  B  j  W  ]. 
An  opposing  risk,  called  the  Type  II  risk  and  denoted  by  0,  would  be 

6  =  P  [  Rating  eL  JL  0  ~  0U  i  9  <  eL  or  9  >  9U  ^  =  p  £  Rating  W  |  A  or  B  ]. 

1-1 

123 


The  straight-forward  way  to  design  the  hypothesis  test  would  be  to 

(1)  Select  an  a. 

(2)  Select  a  size  for  the  planned  data  set. 

(3)  Use  a  to  determine  a  range  of  data,  Ra  which  is  defined  by 
xA  +  1  to  xB  -  1,  within  which  a  measurement  does  not  indicate  a  rating  of 
either  A  or  B. 

(4)  Use  {xA  +  1)  <  { Xg  -  1)  and  the  planned  data  set  size  to 
determine  8  for  values  of  6  such  that  0  <  8^  or  0  >  8q. 

(5)  Repeat  steps  (1)  through  (4)  until  the  supervisor  and  employee 
agree  on  a  triplet  of  a,  planned  data  set  size,  and  0 * s . 

Unfortunately,  the  well-known  mathematical  relations  between  a,  xA,  xB, 
and  8 1  s  are  based  on  a  standard  that  is  an  equality,  or  at  least  a  semi¬ 
infinite  range,  instead  of  a  finite  range.  This  problem  may  be  handled  by 
performing  two  hypotheses  tests  simultaneously.  These  are: 

H0:  6  =  9|_  <===>  w  or  8  hq  :  e  -  eu  <===>  V)  or  A 

Hl;  0  <  e.  <===>  A  Ha  :  e  >  e  <===>  B 

a  l  a  u 

1  t  2 

The  =  signs  in  the  null  hypotheses  may  be  replaced  with  >  in  Hq  and  <  in  Hq. 
This  change  to  semi -infinite  standards  does  not  change  the  application  of  the 
tests  but  it  does  make  the  interpretation  of  the  tests  clearer. 


This  set  of  tests  will  yield  a  unique  member  of  the  A,  W,  B  set  for  any 
measurement.  The  associated  Type  I  and  Type  II  risks  are: 

cts  =  ol  =  P  [  Rating  e  <  eL  |  0  =  0L  ]  =  P  (  Rating  A  j  W  or  B  ] 

<*£  =  a*  =  p  [  Rating  e  >  ey  |  e  =  9U  ]  =  P  [  Rating  B  |  W  or  A  ] 

=  3*  =  P  {  Rating  0  =  0L  |  0  <  0L  ]  =  P  [  Rating  W  or  B ,  |  A  ]  and 

Ps  =  p2  =  P  [  Rating  0  =  0y  [  0  >  0y  ]  -  P  C  Rating  W  or  A  |  B  3 

where  the  E  and  S  subscripts  designate  the  employee's  and  supervisor's  risks. 

The  various  Type  I  and  Type  II  risks  in  the  single  test  and  the  two 
simultaneous  tests  are  not  as  simply  interpreted  as  those  for  a  hypothesis 
test  which  has  only  two  possible  outcomes.  Insight  to  these  relations  may  be 
obtained  by  examining  figures  1  through  5.  One  interesting  result,  which 
relates  the  three  Type  I  risks  defined  above,  is  shown  by  figure  5  to  be 

{<*$  +  Cfjr)  <  a. 

This  inequality  can  be  made  to  approach  an  equality  only  if  the  actual 
W  domain  is  made  much  larger  than  the  domains  of  B  and  A. 

The  two  simultaneous  hypotheses  tests  are  performed  by  comparing  the 
measurement,  x,  with  test  parameters,  x^  and  Xg.  For  a  discrete  distribution 
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in  which  the  probability  of  measuring  x  events  is  given  by  f (x; e) ,  the  maximum 
x  which  implies  0  <  eL  is  denoted  by  xA  and  is  the  largest  x  making 

x 

£  f ( i ;  6|_ )  <  «*5 
i  =x  0 

where  x  is  the  lowest  value  of  i  making  f(i;e)  >  0  .  Similarly,  xB  is  the 

o 

minimum  x  which  implies  0  >  0y  and  is  the  smallest  x  making 

X 

oo 

£  f(i ;  eu)  <  «E  or 
i=x 

x-1 

z  f ( i ;  Oy)  >  (1  "  «E) 
i=x0 

where  xOT  is  the  highest  value  of  i  making  f ( i ;  e)  >  0  .  If  data  yields  an  x 
such  that  xA  <  x  <  xB  or  (xA  +  D  £x<  (xB  -  D.the  null  hypotheses  are 
both  accepted  and  the  assumed  rating  is  W.  On  the  other  hand,  x  >  xA  implies 
A  and  x  <  xB  implies  B. 

It  should  be  noted  that  the  calculations  of  xA  and  xB  yield  worst  case 
values  if  the  null  hypotheses  are  inequalities.  Each  equation  is  the  well- 
known  result  when  the  null  hypothesis  is  an  equality.  The  use  of  0L  and  0y  as 
ends  of  semi-infinite  intervales  correspond  to  the  worst  cases  in  those 
intervals. 
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The  Type  II  risks  for  the  two  simultaneous  hypotheses  tests 
from  xA  and  xB  by 


f ( i ;  0 )  =  1-2  f(i ;  ©) 
i=x„ 


V1 

=  2  f(i;e). 

i  =xa 


i=xA+l 


For  sufficiently  low  values  of  a  and  large  values  of  xM  -  x0, 
differ  only  slightly  from  the  traditional  8  risk  given  by 


Xg“l 

8  =  2 
i=xA+l 


because 

XB~1 

2  f ( i ;  0 )  =  1  and 

i=x0 


2  f  ( i ;  e )  =  0 

i=x0 

for  the  values  of  0  that  are  of  interest  in  the  calculation  of 


*B“1  x/\ 

f  ( i ;  e )  =  2  f  { i ;  e )  -  2  f(i;e) 

i=x0  i=x0 


are  calculated 


8e  and  will 


&£  and  8ij. 
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The  Type  I  risk.  Type  li  risks,  and  number  of  measurements  taken  are 
inter-related  and  competing  factors.  The  balancing  of  these  factors  must 
result  from  consideration  of  (1)  proposed  values  of  oE,  and  the  number  of 
measurements  and  (2)  the  mathematically  resulting  values  of  %  and  3S.  The 
employee  and  supervisor  can  be  aided  in  their  balancing  consideration  by 
operating  characteristic  (OC)  curves.  The  OC-curve  is  a  graph  of  the  Type  II 
risk  versus  e  with  the  number  of  measurements  as  a  parameter.  The  employee 
naturally  wants  an  OC-curve  with  <*£  and  small  while  the  supervisor  wants 
both  cts  and  8$  small. 

B.  Poisson. 

The  Poisson  distribution  function, 
xx  eX 

p( x; x )  =  - —y—  for  x  =  0,  1,  2,..., 

describes  the  distribution  of  the  random  variable  x  in  time  t  provided  that  t 
can  be  divided  into  intervals  At  such  that: 

i )  P  [  x  >  1  in  At  ]  =  0, 

i)  P  [  x  =  1  in  At  ]  =  (k)  (  At  )  where  \  »  kt,  and 

iii)  Xy  is  independent  of  Xj  where  i  and  j  refer  to  any  two  different 

intervals. 
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The  use  of  f(x;9)  =  p{x;x),  e  =  X,  xo  =  0,  and  =  °°  in  the  equations  of 

Section  I-A  yields  formulas  for  the  design  of  simultaneous,  Poisson  hypotheses 
tests  to  select  an  A,  W,  or  B  performance  rating. 

The  parameter  x  is  a  meaningful  property  to  test.  It  is  the  mean  value  of 
x  in  time  t.  (Interestingly  but  usually  less  directly  applicable,  x  is  also 
the  variance  of  x  in  time  t.)  The  additive  property  of  x, 

Xtl+t2  =  xti  +  Xt2 

for  any  nonoverlapping  times  ti  and  t2,  makes  the  actual  substitution  for  the 
parameter  e  equal  to  the  product  of  F  and  x  instead  of  x.  Here  F  is  the 
fraction  of  the  time  t,  for  which  x  is  the  mean,  that  observations  are  made  in 
the  measurement  of  x. 

C.  Binomial. 

The  binomial  distribution  function, 

b(x;n,q)  =  (n~-~n^T'xT  qX  (1  "  q^n  ”  ^  for  x  =  °>  •••»  n» 

describes  the  distribution  of  the  random  variable  x  provided  the  following 
conditions  are  met: 

i)  x  is  the  number  of  "bad"  events  in  a  random  sample  of  size  n 
selected  from  an  infinite,  dichotomous  population, 
i  i )  P[x-l]  =  q  when  n  =  1. 


The  use  of  f(x;e)  =  f(x;e)  =  b(x;n,q),  0  =  q,  x  =  0,  and  x„  =  n  in  the 
equations  of  section  I-A  yields  formulas  for  simultaneous,  binomial  hypotheses 
tests  to  select  an  A,  W,  or  B  performance  rating. 

Either  the  parameter  q  or  its  mirror  image  parameter  p  -  1-q  is  a  meaning¬ 
ful  parameter  to  test.  They  are  respectively  the  fraction  defective  and 
fraction  correct  of  the  population.  To  use  the  language  of  "goodness"  instead 
of  "badness",  simply  substitute  1-p  for  0  and  y  =  n-x  for  x  and  use  yA  =  n-xA 

and  yB  =  n-xB  in  the  acceptance  range  of  yA  >  y  >  yB.  When  either  the  p  or  q 

description  is  desired,  it  may  be  advantageous  to  do  the  calculations  in  the 
opposite  interpretation  because  of  available  tables  and/or  computer  programs. 

The  design  of  a  binomial  hypotheses  tests  involves  the  balancing  of  a,  %, 
8S,  and  n  for  a  justifiable  tolerance  interval.  Figures  6  and  7  present 
OC-curves  for  a  reasonably  high  tolerance  interval  and  low  Type  I  risks. 
These  may  be  used  to  balance  the  risk  and  the  amount  of  data  taken. 

Another  example,  with  an  inordinately  high  tolerance  interval,  is 


summarized 

in 

the 

table 

below. 

The  standard 

used  is  99.5  +_ 

.2  percent 

"goodness" 

or 

% 

=  .003 

and  qB 

=  .007.  The 

Type  I  errors 

used  are 

«E  =  «s  = 

.05. 

The 

last  two  columns  present  two  points  of  the  OC-curves. 

n 

H, 

XB 

yB 

Ra 

%lpa 

500 

-i 

8 

501 

492 

493-500 

1.00 | .9985 

.93  j .9915 

2000 

l 

21 

1999 

1979 

1980-1998 

.80 | .9985 

.81 j .9915 

6000 

10 

54 

5990 

5946 

5947-5989 

,29| .9985 

.64 | .9915 

18000 

41 

146 

17959 

17854 

17855-17958 

.004|.9985 

.27 | .9915 

36000 

90 

279 

35910 

35721 

35722-35909 

.000003|.9985 

. 06  j .9915 
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Actual 


Decision 


Figure  1:  Transitions  from  actual  conditions  to  rating  decisions  when  one 
hypothesis  test  is  used.  Horizontal  transitions  would  have  no 
risks.  Risks  of  changing  B_,  W,  or  A  are  labeled  with  the 
appropriate  Type  I  or  Type  II  risks. 
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Figure  2:  Transitions  from  actual  conditions  to  rating  decisions  for  two 
hypothesis  tests.  Horizontal  transitions  would  have  no  risks. 
Risks  of  changing  B,  W,  or  A  are  labeled  with  the  appropriate 
Type  I  or  Type  II  risks. 
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Figure  3:  The  nine  possible  combinations  of  actual  conditions  and  rating 
decisions  as  viewed  with  one  hypothesis  test.  The  three  blocks 
with  downward  to  the  right  shading  represent  correct  decisions  and 
have  no  associated  risks.  The  four  blocks  labeled  with  a  and  3 
represent  risks  that  are  covered  by  the  indicated  Type  I  or  Type  II 
risks.  The  two  blocks  that  are  unshaded  and  unlabeled  have  risks 
that  are  not  addressed  by  the  test. 
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Figure  4:  The  nine  possible  combinations  of  actual  conditions  and  rating 
decisions  as  viewed  with  two  hypothesis  tests.  The  three  unshaded 
blocks  represent  correct  decisions  and  have  no  associated  risks. 
The  three  blocks  toward  the  upper-right  have  associated  employee 
risks  because  the  decision  is  lower  than  actual  conditions. 
Conversely,  the  lower-left  blocks  have  supervisor  risks.  Shading 
that  is  upward  to  the  right  indicates  that  the  block  is  covered  by 
a  Type  I  risk.  Conversely,  downward  to  the  left  shading  indicates 
a  Type  II  risk.  Note  that  two  blocks  are  double  covered. 


1-12 


134 


Actual 


B 


CEJ^T 

w 


A 


■% 


r 


D 


B 


e 


c 

i 

s 

i 

i 

o 

n 


1  H 


A 


Figure  4:  The  nine  possible  combinations  of  actual  conditions  and  rating 
decisions  as  viewed  with  two  hypothesis  tests.  The  three  unshaded 
blocks  represent  correct  decisions  and  have  no  associated  risks. 
The  three  blocks  toward  the  upper-right  have  associated  employee 
risks  because  the  decision  is  lower  than  actual  conditions. 
Conversely,  the  lower-left  blocks  have  supervisor  risks.  Shading 
that  is  upward  to  the  right  indicates  that  the  block  is  covered  by 
a  Type  I  risk.  Conversely,  downward  to  the  left  shading  indicates 
a  Type  II  risk.  Note  that  two  blocks  are  double  covered. 
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Appendix  II .  P-VALUE  AND  Q-VALUE  INTERPRETATION 


A  hypothesis  test  may  be  viewed  in  two  distinct  ways  after  the  data  has 
been  collected.  The  more  traditional  view  for  performance  appraisal  is  to 
merely  designate  above,  within,  or  below  tolerance  as  the  evaluation  for  an 
action/task.  A  more  informative  view  uses  p-values5  and  q-values"  to  indicate 
the  degree  to  which  the  performance  is  above,  within,  or  below  tolerance  on 
one  or  more  actions/tasks.  If  a  job  element  has  more  than  one  action/task  and 
at  least  one  action/ task  is  appraised  using  a  hypothesis  test,  the  supervisor 
may  use  p-values  and  q-values  in  the  subjective  mapping  of  action/task  ratings 
into  the  job  element  rating.  This  appendix  presents  examples  of  the  p-value 
and  q-value  interpretation. 

If  a  supervisor  uses  a  seemingly  rigid  hypothesis  test  with  pE  =  .92, 
Pu  =  .96,  aE  =  as  =  .05,  n  =  2000,  yA  =  1935,  yB  =  1819,  %  =  .0001  for 
p  •=  .98,  and  6S  =  .07  for  p  =  .90,  the  actual  appraisal  for  this  action/task 
can  be  quite  flexible.  Of  course,  the  supervisor  can  insist  that  a 
measurement  of  y  such  that  y  _>  yA  is  needed  to  resul  t  in  an  above  tolerance 
rating.  However,  a  more  flexible  and  informative  interpretation  might  be  made 
as  follows. 

Suppose  that  y  =  1930  is  the  measurement  from  the  sample  of  n  =  2000. 
Since  1930  /  1935  =  yA,  the  narrow  interpretation  is  that  the  employee  is  not 
appraised  as  above  tolerance  even  though  1930/2000  =  .965  >  .96  =  py. 


II~1 


138 


Since  the  P9int  estimate  of  p  js  greater  the* n  the  employee  may  well  be 
interested  ip  how  the  test  wguld  have  to  be  changed  to  just  barely,  yield  an 
above  tolerance  flpprats^l.  Assuming  thfl$  Py  and  PL  arf  unchanged,  both  a  and 
B  risks  need  tg  t^e  changed  to  make  yA  =  19^0. 

The  dispute  nature  gf  the  binomial  distribution  make?  it  impossible  to 
state  an  exact  replacement  fpr  5  .05.  Actually,  the  "setting"  of 

"at"  .05  really  designates  a  range  of  .046  <  ag-  <  .059  when  n  -  20QO  and 
Pu  -  .96.  Fpr  yA  to  be  set  at  1930,  must  be  in  the  range  of 

.138  <  a£  <  .166.  Thus,  the  change  needed  tg  improve  the  rating  requires  an 

increase  in  «E  by  roughly  a  factor  of  three.  The  formal  way  to  make  this 

statement  is  to  say  that  (1)  the  p-rvalue*  as  calculatpd  from  the  data,  is  in 
the  range  pf  ,138  <  devalue  <  466  and  (2)  the  p-rvalup  is  roughly  three  times 
the  designed  Type  I  ri?k, 

The  p-value  present?  one  view  of  the  dataj  the  gther  view  is  presented  by 
q-values,  5ince  there  are  many  initially  designed  risks  with  each  % 
corresponding  to  a  va]ife  of  p,  there  are  many  modified  Type  II  ri?ks  when  data 
modifies  the  Type  |  ri?k  to  a  p-value,  Eaph  mgd1fied  ri?k  is  a  q-value. 
All  of  these  q-valugs  are  nepded  fpr  a  complete  description}  they  may  be 
displayed  as  the  modified  OOcurve  shown  In  figure  1.  The  particular  q-value 
of  interest  corresponds  to  p  =  .965  or  q  ff  l^p  =  .Q35  because  ^hat  is  the 
point  estimate  provided  by  the  measurement  y  ^  1930  or  x  =  n-?y  =  70.  This 
q-value  i?  .47  and  correspond^  to  a  designed  B^  gf  .70.  Thus,  this  q^value  is 
roughly  twprthirds  of  the  designed  Type  II  risk. 

This  particular  example,  and  g  couple  of  other  examples  which  have  within 
or  above  tolerance  test  results,  are  summarized  in  the  following  table: 

f|92 


■i°U  ~  »  .19  -  .33  »  1/4 


A  ^66  „  2.0  w  2 


S3  31  -  42  «  37 


,950  • 988  ^  "*991  '-eJ.  17  22  «  20 

.9999 

■-•  v  f'*.  V.v--.  ;  t»Ur  f  ,4  .}•  *' 

.046  -  .059 


.988  -  .991  2  09 

.473  * 


2.10  «  2.1 


W  -47L.  «  .47  «  1/2 


In  the  above  table,  the  greater  than  unity  entries  for  the  ratio  of  q-value 
to  p-value  supports  a  final  decison  that  the  performance  is  above  tolerance. 
Conversely,  the  greater  that  unity  entry  of  p-value  to  q-value  supports  a 
decision  of  within  tolerance.  The  magnitude  of  these  ratios  indicates  the 


strength  of  this  support. 
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The  following  table  shows  examples  which  have  within  or  below  tolerance  ratings 
from  the  strict  interpretation  of  hypotheses  tests.  A  final  rating  of  within 
tolerance  is  supported  by  a  p-value/q-value  ratio  greater  than  unity.  Conversely, 
q-val ue/p-value  ratios  greater  than  unity  support  a  final  rating  of  below  tolerance: 
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The  interpretation  of  the  last  column  in  both  of  the  above  tables  is  that 
the  within  tolerance  rating  is  supported  by  a  ratio  of  p-value/q- value  that  is 
greater  than  unity.  This  results  from  taking  the  within  tolerance  state  as 
the  null  hypothesis.  To  support  rejecting  the  null  hypothesis  and  rate  the 
performance  as  either  above  or  below  tolerance,  the  q-value  to  p-value  must  be 
greater  than  unity. 

The  bottom  row  in  both  tables  is  for  the  same  measurement.  This  value  of 
y,  1900,  is  closely  within  the  yA  +  1  to  yB  -  1  range,  1936  to  1818,  which 
indicates  neither  above  or  below  tolerance.  Both  p-value  to  q-value  ratios 
are  greater  than  unity  and  support  a  final  rating  of  within  tolerance.  The 
fact  that  p-value/q-value  ratios  are  essentially  equal  for  the  two  tables 
might  be  unexpected  since  1900  is  further  from  yB  =  1819  than  yA  =  1935.  This 
is  a  consequence  of  having  both  py  and  py  near  unity;  the  binomial 
distribution  is  not  symmetrical. 

Each  row  in  the  above  tables  may  be  used  to  appraise  performance  on  an 
individual  task/action.  Combinations  of  rows  may  be  used  in  the  subjective 
appraisal  of  a  job  element  which  contains  several  tasks/actions.  Naturally, 
this  subjective  appraisal  must  include  all  tasks/actions  in  the  job  element 
whether  or  not  they  are  treated  with  a  hypothesis  test. 

As  elementary  examples  of  appraising  a  job  element  as  exceeded,  met,  or 
not  met,  consider  a  job  element  which  has  only  two  tasks/actions.  Assume  that 
both  are  treated  with  hypothesis  tests.  If  the  two  p-value  to  q-value  ratios 
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are  those  in  the  y  =  1940  and  y  =  1900  lines  of  the  above  tables,  the 
supervisor  may  well  subjectively  decide  on  an  exceeded  rating.  On  the  other 
hand,  there  would  be  less  support  of  an  exceeded  rating  if  the  ratio  were  from 
the  y  =  1930  and  y  =  1900  lines  or  the  y  =  1940  and  y  =  1830  lines. 

Clearly,  the  supervisor's  subjective  decision  becomes  more  complicated  as 
the  number  of  tasks/actions  is  increased.  For  example,  a  job  element  may  have 
(1)  a  couple  of  tasks/actions  not  treated  with  hypothesis  tests  but  judged 
within  tolerance  and  (2)  three  tasks/actions  with  p-value  to  q-value  ratios 
corresponding  to  those  in  lines  of  y  =  1930,  y  =  1900,  and  y  =  1830.  This 
example  has  fairly  strong  justification  for  a  met  rating.  On  the  other  hand, 
replacing  the  y  =  1930  line  with  the  y  =  1818  line  would  make  a  met  appraisal 
more  difficult  to  support. 

In  any  nontrivial  situation,  the  use  of  a  hypothesis  test  on  one  or  more 
task/action  will  not  provide  the  supervisor  with  an  automatic  decision.  The 
use  of  p-values  and  q-values  will,  however,  guide  the  supervisor  in  the 
necessary  subjective  decision.  Ignoring  the  p-values  and  q-values  would  be 
indefensible  because  that  would  deprive  the  manager  of  objective  information. 
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ABSTRACT 

A  contingency  table  is  a  presentation  of  count  data  resulting  from 
cross-classifications.  For  this  type  of  data  there  are  many  models 
available  to  aid  in  the  explanation  of  the  relationships  of  the 
corresponding  variables.  The  choice  of  an  appropriate  or,  perhaps,  the 
most  appropriate  model  depends  on  a  number  of  factors  including  both  the 
generating  sampling  model  and  the  hypotheses  to  be  considered.  The  purpose 
of  this  paper  is  to  describe  some  of  these  explanatory  models  and  provide 
some  recommendations  for  their  use. 

INTRODUCTION 

The  cross-classifications  of  a  contingency  table  are  variables, 
factors,  or  responses  which  have  a  number  of  levels  or  categories.  Terms 
used  synonymously  for  this  type  of  data  are  cross-classified, 
cross-tabulated,  categorical,  qualitative,  or  frequency  data.  These  data 
are  the  result  of  cross-classifying  a  population,  or  sample  from  a 
population,  and  accumulating  totals  for  each  "cell”  of  the  contingency 
table.  A  cell  total,  then,  is  the  number  of  observations  from  the 
population  or  sample  that  fall  into  the  categorical  combination  represented 
by  that  cell.  The  table  summarizes  information  for  the  entire  population 
or  sample,  where  every  observation  is  categorized  into  one  and  only  one 
cell. 
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A  two-dimensional  (two-way),  r  x  s  contingency  table  has  two  variables: 
one  variable  having  r  categories  and  one  variable  having  s  categories.  The 
"complete"  cross-classification  gives  a  total  of  r»s  cells.  The  following 
notation  for  a  two-way,  r  x  s  table  will  be  used: 

{x^j}  =  table  of  observed  values; 

{ Pi-j >  =  table  of  cell  probabilities; 

{m-£j}  =  table  of  expected  values; 
s-' 

E  x-h  =  xj_.  =  observed  row  marginals,  i=l,2,...,r; 

j=l 

r 

E  x^^  =  x. a  =  observed  column  marginals,  j=l,2,...,s; 
i=l  J 

r  s 

E  E  x-h  =  x  =  N  =  total  sample  size  or  population. 

i=l  j“l 

The  marginal  probabilities  (pi,,P4j)  and  marginal  expected  values  (ra±(i,m>j) 
are  similarly  defined.  This  notation  is  easily  extended  to  higher-way 

tables  (tables  with  more  than  two  variables)  simply  by  adding  more 

subscripts . 

The  primary  purpose  in  developing  models  for  contingency  table  data  is 
to  help  in  the  determination,  interpretation,  and  explanation  of  the 
relationships  among  the  variables.  Beginning  with  Pearson  (1900), 
statistical  techniques  have  been  developed  and  used  to  test  for  these 
variable  relationships,  but  only  recently  has  the  focus  been  on  the  use  of 
models.  Statistical  techniques  in  support  of  models  have  now  been 
well-developed.  Specialized  statistical  computer  packages  for  contingency 
table  models  (e.g.  ECTA-Goodman  and  Fay  1973,  CONTAB-Zahn  1976,  and 
GENCAT-Landis  et.  al.  1976)  have  been  available  for  some  time  and  the 
currently  popular  general  statistical  packages  (SPSS,  BMDP,  SAS)  have 
contingency  table  data  models  and  associated  statistical  techniques. 
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The  use  of  these  models  and  computer  packages  provides  flexibility  in 
the  analysis  of  various  type  problems  including  those  with  many  variables 
and  complicated  structures  that  a  few  years  ago  would  have  been  impossible 
to  analyze.  The  models  provide  the  same  ease  of  interpretation  that  the 
linear  models  of  ANOVA  and  regression  provide.  In  fact,  the  interpretations 
of  the  parameters  of  the  contingency  table  models  are  often  analogous  to 
corresponding  parameters  in  ANOVA  and  regression  models.  Also,  contingency 
table  models  allow  for  classic  model  building  in  a  manner  similar  to 
stepwise  regression. 

MODELS 

The  models  available  for  contingency  table  data  are  many  and  varied  and 
often  have  specialized  use.  The  models  having  most  universal  appeal  and  to 
be  discussed  in  this  paper  are  the  log-linear  and  logit  models.  Other 
models  include  an  additive  model  (Bhapkar  and  Koch  1968),  the  Lancaster 
(1949,  1950,  1969)  partitioning  model,  and  a  general  linear  model  (Nelder 
and  Wedderburn  1972  and  Nelder  1974)  with  the  log-linear  model  as  a  special 
case.  The  additive  model  has  been  used  for  special  problems  such  as  sample 
surveys,  drug  comparisons,  and  biological  assays  (e.g.,  see  Johnson  and 
Koch  1970  and  Koch  and  Reinfurt  1971).  Johnson  and  Koch  discuss  the 
advantages  of  the  additive  model  for  sample  survey  data.  In  general,  the 
log-linear  and  logit  models  are  the  most  extensively  used,  providing 
convenient  parameters  for  most  hypothesis  testing  situations.  An  excellent 
discussion  and  comparison  of  the  corresponding  additive  and  multiplicative 
interaction  terms  for  the  additive  and  log-linear  models,  respectively,  is 
given  by  Darroch  (1974). 
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The  log-linear  model  is  most  convenient  for  general  independence-type 
hypothesis  testing  situations  under  poisson  or  multinomial  samplings  As  a 
motivating  example,  consider  a  2  x  2  contingency  table.  The  classic 
concept  of  independence  requires  that 

pu  "  pi-p-j  •  1  “ 1,2  *  d  ‘  1,2  ’ 

A  single  parameter  measuring  this  interaction  is  Yule's  (1900) 
cross-product  ratio 


a  = 


P  P 
11  22 


P12P21 

Independence  exists  when  this  ratio  is  equal  to  one.  Taking  the  logarithm 
of  a  under  independence, 

In  a  =  £n  p1  x  -  An  p12  -  An  p^  +  An  p22  =  0,  (1) 

we  can  see  the  motivation  in  using  a  log-linear  model  -  a  zero-valued 
parameter  would  Imply  independence. 

The  general  log-linear  model  most  frequently  used  was  presented  by 
Birch  (1963).  For  an  r  x  g  table  the  model  is 


*ij  ‘  Pij  "  “  +  “l(l)+  “2<j>  +  S'1'2 . 8 

This  model  is  over-parameterized  in  that  there  are  r  +  s  +  (r*s)  +  1 
parameters  for  r» s  cells.  Analogous  to  ANOVA,  the  constraints 

|  Ul(i)  =  •  U2(j)  =  |  U12(ij)  =  J  U12 (ij )  =  ° 
are  conveniently  imposed.  As  an  example,  for  the  2x2  table  the 
constraints  allow  a  reparametrization  of  the  model  in  equation  (2)  by 


(2) 


(3) 


l2(l)> 

and 

U12  = 

!  u 

12(11) 

*11  " 

u 

+ 

U1  + 

U2 

+  U12 

*12  = 

u 

+ 

U!  - 

U2 

“  U12 

*21  " 

u 

- 

U1  + 

U2 

~  U12 

I! 

CM 

CM 

c* 

u 

- 

U1 

U2 

+  U12 

(4) 
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Now  the  ”u”  parameters  can  be  determined  uniquely  in  terms  of  the 
logarithms  of  the  probabilities.  Specifically, 


u  -  1/4 

CM 

+ 

*12 

+ 

*21 

+  *22 

“l  ’  1/4 

‘‘LI 

+ 

*12 

- 

*21 

“*22 

“2  -  1/4 

<*11 

- 

*12 

+ 

*21 

*22 

u12  “ 

<*11 

- 

*12 

- 

*21 

+  *22 

(5) 


The  "u"  parameters  of  equations  (2)  through  (5)  have  analagous 
interpretations  to  the  parameters  of  the  linear  model  for  ANOVA.  In 
particular,  for  the  2x2  model  of  equations  (4)  and  (5),  u  is  the  average 
of  the  logarithms  of  the  probabilities,  is  the  average  differences 
across  the  first  variable  levels,  and  is  the  average  differences  across 
the  second  variable  levels.  As  in  ANOVA,  u^2  is  an  interaction  term,  which 
for  the  2x2  table  measures  the  dependence  between  the  variables  in  the 
sense  of  Yules'  cross-product  ratio  a  and,  specifically,  from  equation  (1) 
equals  1/4  An  a.  Most  importantly,  under  independence  or  "no  interaction", 


u12  equals  zero. 

Another  useful  form  of  the  log-linear  model  and  one  frequently 
overlooked  in  the  literature  was  first  presented  by  Ku,  Varner,  and 
Rullback  (1968)  and  has  been  used  primarily  by  Kullback  and  his  associates. 
Instead  of  the  constraints  in  (3),  Kullback  fixes  one  cell  of  the 
contingency  table  and  defines  the  parameters  to  measure  for  each  variable 
and  interaction,  a  difference  from  this  fixed  cell.  For  the  2x2  table 
with  cell  22  fixed,  the  model  is 

*11  ’  T0  +  T1  +  T2  +  T 12 


*12  ’  T0  +  T1 
*21  ’’O  +T2 
*22  "  T0  ‘ 


(6) 
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Solving  for  the  new  t  parameters, 

Tq  -  i 


T1 


t2  =  £ 


T12  "  * 


22 

-  z 

12 

22 

-  Z 

21 

22 

-  Z 

11 

12 

*21  +  *22’ 


In  terms  of  the  Birch  model  "u"  parameters, 


T0  =  U  “  U1  “U2  +  U12 
T1  =  2(U1  "  U12) 

T2  =  2(u2  "  U12) 


T12  =  4U12 


(7) 


(8) 


The  important  interaction  parameter  is  proportional  to  Birch's  u12  and 
both  reflect  independence  for  values  of  zero. 

It  is  interesting  to  recognize  the  similarity  between  these  models  and 
models  for  ANOVA.  Similiar  to  Birch's  log  linear  model,  the  usual  linear 
model  for  ANOVA  defines  an  overall  mean  parameter,  and  measures  factor 
effects  as  differences  from  this  mean.  On  the  other  hand,  similar  to 
Kullbacks  log-linear  model,  the  regression  model  for  ANOVA  fixes  one  factor 
level,  and  defines  the  regression  coefficients  as  the  differences  of  the 
other  factors  from  this  fixed  level. 

In  addition  to  log-linear  models,  logit  models  are  also  very  popular 
for  certain  applications  of  contingency  tables.  In  particular,  for 
product-multinomial  sampling  with  homogeneity-type  hypotheses  and  one  or 
more  response  variables,  logit  models  are  very  useful.  For  example, 
consider  the  factor  and  response  problem  depicted  in  Figure  1. 
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P1 2 

P21 

P22 

Figure  1,  Factor  A,  Response  B 

Here,  A  is  the  factor  at  2  levels  and  B  is  the  binomial  response.  The 
homogeneity  hypothesis  would  be  :  p^  =  p21  (or  p^2  =  P22)*  Under  HQ  , 
the  log-linear  models  would  require  that  two  parameters  equal  zero,  namely 
from  (5)  and  (7), 

Birch:  =  u12  -  0 

Kullback:  =  x^2  =  0. 

Yet,  the  homogeneity  hypothesis  is  a  one  degree-of-f reedom  test  and  a 
convenient  model  should  provide  a  single  corresponding  zero-valued 
parameter.  Defining  the  logit  *»  in(p^/p^2)  for  i  =  1,2, 


Lj  =  *n(pn/p12)  =  Xn  pn  "  An  P 


12 


and 


=  2u2  +  2ui2 


L2  =  *n(p21/p22>  =  P21  "  *n  P22 

=  2u„  -  2u, 


12  * 


Letting  w  =  2u2  and  w^  =  2uj2, 


and 


L-j  =  w  + 

l2  =  w  ~  wi 


w  -  1/2  (L1  +  L2) 
wx  =  1/2  (Lx  -  L2). 


(9) 


(10) 


Now,  the  single  model  parameter  w^  corresponds  to  the  one  degree-of-f reedom 
homogeneity  hypothesis  (i.e.,  HQ :  p^  =  P21  <=>  =  0). 
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LARGER  TABLES 


Extending  these  models  to  larger  tables  is  relatively  straight  forward; 
although,  some  care  is  required  to  insure  clear  definitions  of  the 
parameters  so  that  they  will  purposely  relate  to  the  hypotheses  of  concern., 
Appendix  A  provides  the  models  and  hypotheses  for  the  2x3  table  and 
Appendix  B  for  the  three-way  2x2x2  table. 

Initially,  considering  the  2x3  table,  the  independence  hypothesis  is  a 
two  degree  of  freedom  test  and  each  log-linear  model  provides  two  convenient 

parameters,  uJ2  and  uj2  for  the  Birch  model  and  and  for  the 
Kullback  model.  In  comparing  the  models,  the  arbitrary  fixing  of  a  cell 
in  the  Kullback  model  may  not  appeal  to  some  analysts,  but  the  relative 
simplicity  of  the  model  would  certainly  appeal  to  all.  The  independence 
parameters  for  the  Kullback  model  are  also  easier  to  interpret.  Letting 
amn  ke  the  cross  product  ratio  of  column  m  and  column  n  taken  as  a  2  x  2 
table,  independence  occurs  when  the  three  cross  product  ratios  ®12>  a13> 
and  a23  are  equal  to  one  (any  two  amtl  equal  to  one  will  insure  that  the 
third  is  equal  to  one).  The  log-linear  parameters  relate  to  these  amn  in 
the  following  manner: 


U12 

=  1/6  (An  a12 

+  An 

“13 

U12 

=  1/6  (An  (*12 

+  An 

“23 

ij 

11 

"  £n  “l3 

12 

=  *n  a23  * 

The  Kullback  t  parameters  are  simply  the  logarithms  of  Yules'  original 
cross-product  ratios  for  the  2x2  subtables  that  include  the  fixed  cell. 
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The  appropriate  logit  model  is  dependent  on  the  scheme  of  sampling. 

When  the  data  is  sampled  across  the  rows,  it  is  convenient  to  build  a  model 
that  calculates  logits  based  on  ratios  of  row  probabilities  for  each  column. 
This  is  reflected  in  the  III. a.  model  of  Appendix  A.  Symmetrically,  when 
data  is  sampled  across  columns ,  it  is  convenient  to  build  a  model  that 
calculates  logits  based  on  ratios  of  column  probabilities  for  each  row. 

This  is  reflected  in  the  lll.b.  model  of  Appendix  A.  For  the  sampling 
model  in  III. a.,  the  corresponding  homogeneity  hypothesis  Is  a  two  degree 
of  freedom  test  that  compares  the  probabilities  across  a  row.  The  logit 
model  provides  the  three  parameters  w1 ,  w2 ,  and  w3  and  the  constraint  that 
their  sum  equals  zero.  For  the  model  in  lll.b.,  the  homogeneity  hypothesis 
is  a  two-degree  of  freedom  test  that  compares  the  probabilities  across  any 
two  of  the  three  columns.  The  logit  model  provides  three  parameters 
(corresponding  to  the  three  columns);  any  two  of  which  can  be  used  to  test 
the  hypothesis.  It  should  be  noted  that  other  logit  parameterizations  are 
possible. 

Turning  now  to  the  three-way  2x2x2  table  in  Appendix  B,  the 
comparative  simplicity  of  the  Kullback  model  is  again  apparent.  In  the 

Kullback  model  the  222  cell  has  been  fixed.  The  main  effects 
measure  the  difference  between  the  second  and  first  levels  of  each 
variable  as  compared  to  the  fixed  cell.  The  two-way  interaction  terms 

^Tll,Tll,Tn)  are  the  loSarithms  of  the  three  possible  cross-product 
ratios  with  the  222  cell  that  measure  interaction  between  two  variables 

with  the  third  fixed.  The  three-way  interaction  term  (t^^)  is  the 
difference  of  the  logarithms  of  the  cross-product  ratios  when  variable  one 
is  fixed  at  level  one  compared  to  level  two. 
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The  Birch  model  uses  a  mean  parameter  (u)  which  is  the  average  of  the 
logarithms  of  the  cell  probabilities.  The  main  effects  (uj  ,u2  ,u3 )  average 
the  difference  in  the  logarithms  of  the  probabilities  at  the  two  levels  of 
each  variable,  respectively.  The  interaction  terms  (u12 ,U1 3 *u23 ^  avera8e 
the  logarithms  of  the  cross-product  ratios  corresponding  to  the  two 
measured  variables.  The  three-way  interaction  term  (u.^)  measures  the  same 

i  ilc 

difference  of  logarithms  of  cross-product  ratios  as  does  although,  it 

averages  this  difference  across  the  cells  by  taking  1/8  the  value. 

The  presented  logit  model  considers  that  variable  one  is  a  response 
variable  and  that  product-multinomial  sampling  is  appropriate.  The  model 
is  analogous  to  the  2x2  Birch  log-linear  model;  however,  the  parameters 
(w2,w3,w2g)  measure  the  effect  that  the  corresponding  terms  have  on  the 
response  variable. 

Considering  the  hypotheses  for  the  2x2x2  table  as  listed  in 
paragraph  IV  of  Appendix  B,  the  no  three-way  interaction  hypothesis  is  a  one 
degree-of-freedom  test  and  each  model  provides  one  corresponding 
parameter.  The  logit  model  w23  parameter  (and  corresponding  hypothesis 
test)  is  more  properly  interpreted  as  a  measure  of  the  interaction  between 
variables  two  and  three  as  it  affects  variable  one.  The  mutual  independence 
test  under  multinomial  sampling  is  a  four  degree  of  freedom  test  and  the 
two  log-linear  models  provide  four  parameters  corresponding  to  each 
possible  interaction.  Under  product-multinomial  sampling  the  test  has 
three  degrees  of  freedom  and  the  logit  model  provides  three  parameters. 

The  conditional  independence  test  requires  that  one  variable  be  considered 
fixed  and  that  independence  between  the  other  two-variables  be  tested. 

In  Appendix  B,  variable  three  has  been  fixed.  This  is  a  two-degree  of 
freedom  test  and  each  model  provides  two  parameters.  The  homogeneity  test 


has  many  forms.  The  one  chosen  in  Appendix  B  corresponds  to  the  selection 
of  variable  one  as  the  response  variable  in  the  logit  model.  Under 
complete  homogeneity,  all  these  logits  and  logit  parameters  are  equal  to 
zero.  In  effect  the  2x2x2  table  has  collapsed  to  a  2  x  2  table  with 
variables  two  and  three  remaining.  The  terms  of  the  log-linear  models 
relating  to  the  first  variable  are  also  now  zero. 

CONCLUSION 


It  might  be  said  that  there  is  only  a  limited  amount  of  information 
available  from  any  given  data  set.  For  contingency  table  data,  the  models 
presented  in  this  paper  provide  the  means  to  fully  explain  the  data  with 
respect  to  the  measured  variables,  and  often  indicate  relationships  which 
might  not  have  been  apparent  with  other  techniques. 
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APPENDIX  A.  2x3  TABLE  MODELS 


I.  Birch  Log-linear  Model 
General: 


pll 

'  '  Pl2  ' 

h-p” 

P21 

P22 

P2  3 

=  P±j 

=  u  + 

Define: 

U1 

=  Ul(l) 

U12 

U2 

=  U2(l) 

U1 2 

U2 

=  U2(2) 

Model : 

JKi)  U2(j)  U12(ij); 


1,2;  j  =  1,2,3, 


11 

— 

u 

+ 

U1 

+ 

U2 

+ 

U12 

12 

=> 

u 

+ 

U1 

+■ 

U2 

+ 

U12 

13 

= 

u 

+ 

U1 

+ 

U2 

- 

“2 

21 

= 

u 

- 

U1 

+ 

U2 

- 

U12 

22 

- 

u 

- 

U1 

+ 

U2 

- 

U12 

23 

= 

u 

- 

U1 

- 

U2 

- 

U2 

hi  ”  ui2 


12  12 


Parameters : 


u  -  1/6  <V  +*12  +  *l3  +*21  +t22  +t23) 
U1  =  */6  +  A12  +  *13  ~  *21  “  *22  ’  *23  ^ 


U2  “ 

1/6 

(Mil 

- 

*12 

*13 

+  2£21 

- 

*22  “  *23 5 

U2  " 

1/6 

<-*11 

+ 

2*12 

“*13 

~*21 

+ 

2*22  “*23> 

U12  = 

1/6 

(2£n 

- 

s2  - 

*13 

"  2*21 

+ 

*22  +  *23 > 

Ul’2  = 

1/6 

(“*  11 

-h 

2*12 

“*13 

+  *21 

- 

2*22  +  *23 > 
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II.  Ku 11 back  Log-linear  Model 


Cell 

23 

fixed 

m 

ll 

To 

+  T  ^  +  T  ^ 
11 

+  tlJ 
11 

vn 

12 

To 

+ 

1  2 

+ 

12 

Z 23 

13 

To 

+  Tf 

as 

21 

To 

+  TJ 

22 

To 

+  T  ^ 

2 

23 

To 

Parameters ; 


s= 

z 

0 

23 

i 

3= 

J l 

-  Zn 

1 

13 

23 

j 

53 

z 

-  z 

1 

21 

23 

j 

« 

z 

-  a 

2 

22 

23 

ij 

=S 

z 

-  Z 

-  Z 

•f  z  „ 

11 

11 

13 

21 

23 

ij 

=s 

z 

-  Z 

~ 

■f  2 

12 

12 

13 

22 

23 

III.  Logit  Model 
2 

a.  £  p  =  1  for  j  =  1,2,3 
1  =  1 

Define:  L^  =  fcnCp^/p^),  j  =  1,2,3 
Model:  Lj  =  w  + 

L2  ‘  “  +  "2 

L3  *  “  +  "3 
Constraint: 

w  +  w  +  w,  =  0 
12  3 
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Parameters : 


W  J 

=  1/3 

(L1  + 

L2  + 

L3) 

W1 

-  1/3 

(2Lj 

“  L2 

“  V 

W2 

=  1/3 

(~L1 

+  2L2 

-  V 

W3 

-  1/3 

(~L1 

l2 

+  2L3) 

3 

b«  £  p-h  -  1  for  i  =  1,2 


Define:  Ljj  =  £n  Plj/  £  Pij  for  i  =  1,2;  j 

k*j 


=  1,2,3 


General:  L-^j  =  w  + 


Constraints:  =  ®  for  J  =  1>2>3 


Model: 


W1 

Wl(l)»  W2 

L11 

= 

w 

+ 

W1 

L1 2 

= 

w 

+ 

w2 

L13 

= 

w 

+ 

W3 

L21 

= 

w 

- 

W1 

L22 

=5 

w 

- 

w2 

L2  3 

= 

w 

- 

W3 

w„  =  w„ 


Parameters : 


w  : 

-  1/6 

££  Lij 

ij 

W1 

=  1/2 

<L11  " 

L21 

W2 

=  1/2 

^L12 

L22 

W3 

=  1/2 

1 

CO 
1— 1 
►4 

L23 
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IV.  Hypotheses 


1.  Independence  HQ  :  -  Pis  paj,  i  «  1,2;  j  =  1,2,3 

B^ch  V  u12  -  u’2  -  0 

Kullback  HQ  :  T  ^ j  =“  ^12  =  0 

2.  Homogeneity 


V 

pll 

P12 

=  P 

13 

=>  P2 1  = 

P22  P23 

Birch 

V 

:  u2 

"  u2 

=3 

U12  “  U12 

=  0 

Kullback 

V 

T  1  " 

X2 

=  Tij  . 
11 

^-0 

Logit 

V 

:  Wl 

=  W2 

= 

w3  =  0 

b.  Ho:  pu  =  p21 

=>  pl3  =  P23 

p12  =  p22 

Birch  Hq :  u1  =  uJ2  =  uj2  =  0 

Kullback  HQ  :  T*  =  =  0 

Logit  Hq  :  wx  =  w2  =  w3=  0 


160 


APPENDIX  B.  2x2x2  TABLE  MODELS 


plll 

?1 12 

P121 

P122 

P2 1 1 

P 

212 

P22 1 

- 

P222 

I.  Birch  Log-linear  Model 


General: 


Jk  U2(J)  +  U3(k)  +  U12(lj)  +  U13(ik)  +  U23(jk)  *  U123(ijk): 

1  =  1,2;  j  =  1,2;  k  =  1,2. 

Define : 

U1  Ul(l>  U3  =U3(1)  “j2  "  ul2(n) 

U2  =U2(1)  U12=U12(11)  U23  ~  U23  (11) 

Model: 


*111 

— 

u 

4* 

U1 

+ 

u2 

+ 

u3 

+ 

U12  + 

U1 3 

+ 

U23 

+ 

U123 

*112 
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u 

+ 

U1 

+ 

u2 

- 

U3 

+ 

U12  " 

U13 

- 

U23 

- 

U1 2  3 

*121 

= 

u 
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U1 

- 

U2 

- 

U3 

- 

U12  + 

U1 3 

- 

U23 

- 

U123 

*211 

u 

— 

U1 
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U2 

+ 

U3 

- 

U12  ~ 

U13 

+ 

U23 

- 

U123 

*212 

= 

u 

— 

U1 

+ 

U2 

- 

U3 

- 

U12  + 

U13 

- 

U23 

+ 

U123 

SL 

221 

— 

u 

“ 

U1 

— 

U2 

+ 

U3 

+ 

U12  “ 

U13 

- 

U23 

+ 

U123 

*222 

ss 

u 

— 

U1 

- 

U2 

- 

U3 

+ 

U12  + 

U13 

+ 

U23 

- 

U123 

Parameters : 


u 

1/8 
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*112 

+  *121 

U1 

- 

1/8 

“n, 

+ 

*112 

+  *121 

U2 

1/8 

+ 

*112 

"  *121 

U3 
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1/8 

(<U1 

- 

*112 

+  *121 

U12 

1/8 

®in 

+ 

*112 

-*121 

U13 
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1/8 

wm 

- 

*112 

+  *121 

U23 

= 

1/8 

<‘i„ 

- 

*112 

~  *121 

U1 23 

1/8 

«m 

- 
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"  *121 

+*122  +  *211  +  *212  +  *221  +  *222 > 
+  *122  “  *211  ~*212  “  *221  -*222> 
“  ^122  +  *211  +  *212  _  *221  ~  *222 5 

*122  +  *211  “  *212  +  *221  "  *222^ 

~  *122  ~  *211  "  *212  +  *221  +  *222) 

"  *122  “  *211  +  *212  “  *221  +  *222^ 

+  *122  +  *211  ~  *212  ~  *221  +  *222^ 

*  *122  '*211  +*212  +*221  '  *222 > 


f 
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II.  Kullback  log-linear  Model 
Define:  Cell  222  fixed 
Model : 


^111 

“  To  + 

1  ,  j 

T  4-  T 

1  l 

k  ii  ,  ik  ,  jk 

+  Tl  +  TjJ  +  Tn  +  Tn 

+  T1Jk  1 

111  . 

*112 
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1  1 
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=  To  + 

_i  k 

T  T 

1  1 
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11 

j 

*122 

=  To  + 
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T1 

t 
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*211 

=  To  + 
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11 
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*212 
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j 

j 
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"  To  _ 

1 

Parameters : 

f 

\ 

To 

*222 

) 

=  *122 

“  *222 

\ 

\ 

i 

1 

4 

=  *212 

~  *222 

i 

:  .  l' 

k 

Tl 

=  £ 

221 

-  % 

111 

f 

) 

i 

J 

11 

=  *112 

“  *122 

*212  +  *222 

} 

( 

Tik 

11 

=  *121 

“  *122 

*221  +  *222 

l 

I* 

11 

=  *211 

"  *212 

*221  +  *222 

[ 

TiJk 

111 

=  *111 

*112 

~  *121  +  *122  ~  *211  + 

*212  +  *221  “  *222 

162 


III.  Logit  Model: 

General : 

Ljk  '  ln('^)  '  “  +  W2(  j )  +  W3  (k)  +  W23  ( jk)  ’  J  ‘  M;  k  '  *’2 
where 

j  W2(j)  =  l  W3(k)  =  j  W23(jk)  =  l  W23(jk)  “  0• 

Define : 

W2  ”2(1)’  W3  =  W3(l)’  W2  3  W23(ll)* 

Model: 


L11 

=  w 

+ 

w2 

+ 

W3 

+ 

W23 

L12 

“  w 

+ 

W2 

- 

W3 

- 

W23 

L21 

=  w 

- 

W2 

+ 

W3 

- 

W23 

L22 

=  w 

- 

W2 

- 

W3 

+ 

W23 

Parameters : 


w 

=  1/4 

<L11 

+ 

L12 

+ 

L21 

+ 

L22 

W2 

=  1/4 

{L11 

+ 

L12 

- 

L21 

- 

L22 

W3 

=  1/4 

<L11 

- 

L12 

+ 

L21 

- 

L22 

W2  3 

=  1/4 

(L11 

- 

L12 

- 

L21 

+ 

L22 

IV.  Hypotheses 

1.  No  three-way  Interaction  (No  second  order  Interaction) 
Sirch  Hq :  u123  =  0 

Kullback  HQ  :  =  0 

Logit  Hq :  w23  =  0 
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2.  Mutual  (Complete)  Interaction 


Birch 

V 

U12 

ro 

r— < 

3 

II 

U23 

"  u123  '  0 

Kullback 

V 

TiJ 

11 

=  rik 
11 

+  Tjk 
11 

-  T1Jk  -  0 
111 

Logit 

V 

W2 

ii 

II 

W23 

0 

3. 


Conditional 

Birch 

Ku 11 back  H0 
Logit  Hq 


Independence  (1  to  2  with  3 
=  0 

=  0 


U12  U123 


-ij 


11 


111 


W2  =  W23  “  ° 


fixed) 


4.  Homogeneity  of  Tables 

Birch  H0:  ^  -  u1?  =  -  uJ23  =  0 

Rollback  HQ  :  =  T^k  =  0 

Logit  HQ  :  w  =  w2  =  w3  =  w 23  0 
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On  a  Class  of  Probability  Density  Functions 
H.  P.  Dudel  and  S.  H.  Lehnigk 
U.S.  Army  Missile  Command 

Research,  Development,  and  Engineering  Center 

Research  Directorate 

Redstone  Arsenal,  Alabama  35090-5243 


SUMMARY 

The  application  of  a  three  parameter  class  of  one-sided  probability  distribu¬ 
tions  is  being  discussed.  For  specific  parameter  values,  this  class  contain', 
as  special  cases  a  number  of  well-known  distributions  of  statistics  and  sta¬ 
tistical  physics,  namely,  Gauss,  Weibull,  exponential,  Rayleigh,  Gamma,  chi- 
square,  Maxwell,  and  Wien  (limiting  case  of  Planck’s  distribution).  One  ol 
the  three  parameters  represents  scale;  the  other  two  represent  initial  and 
terminal  shape  of  the  associated  probability  density  function.  A  fourth' 
parameter,  shift,  may  be  introduced.  The  distribution  class  discussed  in  Uii 
paper  was  introduced  by  L.  Amoroso  [2]  in  1924.  It  is  closely  connected  with 
a  family  of  linear  Fokker-Pl anck  equations  (generalized  Feller  equation). 

In  fact,  the  class  of  probability  density  functions  associated  with  the  dis¬ 
tribution  class  considered  here  is  a  special  case  of  the  set  of  all  delta 
function  initial  condition  solutions  of  the  generalized  Feller  equation  for 
a  fixed  value  of  the  time  variable.  It  will  be  shown  that,  as  a  function  ol 
the  logarithm  of  the  independent  variable,  the  logarithm  of  the  cumulative 
distribution  function  is  asymptotically  linear  as  the  independent  variable 
approaches  zero  from  above.  This  fact  leads  to  a  general  criterion  for  Lin.' 
applicability  of  the  presented  distribution  family  relative  to  given  empiricu 
data.  The  applicability  criterion  can  be  used  to  determine  approximate  value 
for  the  two  shape  parameters.  They  can  subsequently  be  used  as  initial  value 
in  any  of  the  established  parameter  estimation  techniques. 
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1.  A  Class  of  Distributions 


A  number  of  basic  continuous  distributions  of  classical  statistics 
and  statistical  physics  are  special  cases  of  a  class  of  distributions  which 
is  characterized  by  the  cumulative  distribution  function 


F(x) 


1  Y ( 1 +Q 1 ~X)  ,  K  =  xb”1,  q  =  (X-p ) ( 1 -X) ,  x  >  0 


rTT+qT 


0  ,  x  .<  0  , 


(1.1) 


which  depends  on  the  three  mutually  independent  parameters  b>0,  p<l,  and 
X<1.  With  these  restrictions  on  the  parameters  p  and  X  ,  the  composite 
quantity  q  =  (X-p)(l-X)-1  will  be  greater  than  -1.  In  standard  terminology, 
b  is  the  scale  parameter,  and  there  are  two  shape  parameters,  X  and  p  , 
which  are  independent  of  each  other.  A  fourth  parameter,  the  shift  parameter 
xQ,  may  be  introduced  by  replacing  x  by  x  -  xQ.  The  functions  r(y)  and 
y(a,y)  in  (1.1)  are  the  Gamma  and  the  incomplete  Gamma  functions,  respectively. 


By  means  of  the  integral  definition  of  y(a,y)  [1,8.350.1],  (1.1)  can 
be  expressed  in  the  form 


F(x)  =  rTT+qT  j  t(1"qH  e-t  dt  ,  x  >  0  . 


(1.2) 


Since  y(a,y)  may  also  be  defined  by  means  of  the  degenerate  hypergeometric 
function  $(=  F  )  [1,9.236.4,  9.210.1], 

Y(a,y)  =  -  ya  $(a,a+l ;-y)  ,  (1.3) 
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we  obtain  a  third  expression  for  F(x) 

F(X)  =  r(2+qj  £]‘P  *0+q.  2+q;  ,  x  >  0  ,  (1.2) 

which  will  turn  out  to  be  quite  useful  later  on. 

The  probability  density  function  f(x)  associated  with  the  cumulative  dis¬ 
tribution  function  F(x)  is  given  by 

b-Tc-P  exp  -5’-x  ,  5=  xb->,  q  =  (X-pXl-x)-' ,  *  >  o  , 


The  distribution  class  defined  by  either  the  cumulative  distribution  func¬ 
tion  (1.1)  or  the  probability  density  function  (1.3)  was  introduced  by  L. Amoroso 
[2]  in  1924  and  reconsidered  in  later  publications,  [3],  {4],  15], and  [6]. 

Some  other  aspects  of  this  density  function  class  have  been  discussed  in 
T? J  from  a  theoretical  point  of  view.  That  paper  contains  remarks  about  the 
associated  probability  measure  space  and  the  associated  characteristic  function 
class.  A  more  thorough  discussion  of  the  characteristic  functions  from  the 
point  of  view  of  complex  function  theory  wi 1  be  presented  elsewhere  [8J. 

The  class  of  density  functions  (].3)  contains  the  following  special  cases: 
Gauss  (normal),  Weibull,  exponential,  Rayleigh,  Gamma,  chi-square,  Maxwell, 
and  Wien,  as  has  been  pointed  out  in  [7J 
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2.  The  Moments 


All  moments  of  the  distribution  class  characterized  either  by  the 
cumulative  distribution  F(x)  given  in  (1.1)  or  by  the  associated  density 
function  f{x)  given  in  (1.3)  exist,  provided  the  parameters  b  ,  X  ,  and 
p  are  kept  within  the  ranges  b  >  0,  X  <  1,  and  p  <  1. 

The  characteristic  function  associated  with  F(x)  and  f(x)  is  given  by  the 
Laplace  integral 


Y(s)  * 


/ 


f(x)esx  dx 


f 


=  — -  b"1  f  £"P  exp 

r{l+q)  J 


(_^1-X  +  sx)dx 


1-X 

rTFqT 


00 


£"P  exp 


(~K]~X  +  sbOdC 


C  =  xb”1  , 


(2.1) 


where  s  is  a  complex  variable.  The  last  integral  in  (2.1)  converges  for 
Re  s  <  0  if  0  <  X  <  1,  for  Re  s  <  b”^  if  X  =  0,  and  for  every  s  if  X  <  0. 

Reference  is  made  to  T7J  and  for  a  more  detailed  investigation,  to  [£J.  It 
follows  that  'F(s)  is  holomorphic  in  the  domain  Re  s  <  0  if  0<  X  <  1,  in 

Re  s  <  b-1  if  X  =  0,  and  it  is  an  entire  function  if  X  <  0.  Therefore,  for 
X  <  0  the  moments  of  our  distribution  class  are  given  by 
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m„  =  T(n)(0)  .  ^  Jttn-p.nO-X)-1-!  #tdt 

0 

"  TFRT  r(1+q  +  Nl)  (n=0-1>2’-'-)>  (2-2) 

In  particular,  mQ  =  1 ,  and  the  first  moment,  or  mean  p  ,  is 

■i  *'«  ■  r('  +  q  +  >^)  •  (2-3) 

If  X  is  in  the  range  0  <  X  <  1,  Y(s)  is  not  holomorphic  at  s=0. 
There  is  no  power  series  expansion  about  s=0.  The  moments  in  this  situation 
may  still  be  defined,  however,  by  (2.2)  as  lim  ^n^(s0)»  Re  sQ  <  0,  as  sQ-»- 
0  two-dimensionally  in  the  left-hand  s-plane.  Of  course,  one  may  alterna¬ 
tively  use  the  definition  of  the  moments  in  the  form 
00 

mn  =  f  x11  f(x)dx  (n=0, 1 ,2, . . . ) 
for  0  <  X  <  1 . 


An  Associated  Differential  Equation 


From  an  application  point  of  view  the  usefulness  of  the  distribution 
function  defined  in  Section  1  lies  in  the  fact  that  it  contains  two  indepen¬ 
dent  shape  parameters,  p  and  X  ,  which  allows  fitting  initial  and  terminal 
shapes  (in  the  direction  of  increasing  x)  of  given  distribution  data  indepen¬ 
dently.  However,  there  is  another  aspect  which  may  very  well  be  of  fundament¬ 
al  theoretical  interest. 
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The  class  of  density  functions  (1.3)  is  closely  connected  to  a  class  of 
Fokker-Pl anck  equations.  By  fiat  this  connection  then  is  typical  for  all 
of  the  special  cases  listed  in  Section  1.  It  makes  it  possible  to  investigate 
the  underlying  probabilistic  features  of  the  function  class  (1.3)  and  its 
special  cases  by  employing  the  machinery  of  probability  theory. 

Disregarding  statistical  considerations  completely  at  this  point,  one 
may  ask  the  question:  what  is  the  most  general  one-dimensional  autonomous 
parabolic  (Fokker-Planck)  equation 


9 

3x 


A(x) 


!~  =  0,  z  *  z(x,t),  x  >0,  t  >0, 


(3.1) 


which  admits  a  similarity  solution 

zQ(x,t)  =  b"1 (t)f*(£)  ,  K  =  xb-1 (t)  ,  (3.2) 

which  is  conservative,  i.e.,  for  which 

00 

J  zQ(x,t)dx  =  1, 

0 

This  question  is  an  important  one  in  the  attempt  to  model  diffusion  pro¬ 
cesses  in  the  applied  sciences  and  to  define  initial  and  boundary  condition 
solutions  of  an  equation  of  the  form  (3.1).  In  practical  terms,  the  coeffi-* 
Cients  A(x)  and  D(x)  in  (3.1)  are  the  diffusion  and  drift  coefficients,  re¬ 
spectively.  D(.x)  is  being  called  the  drift  coefficient  because,  if  x  has 
the  unit  length  and  t  the  unit  time,  then  D(x)  acquires  the  unit  length/time. 
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To  obtain  conditions  for  the  coefficients  A(x)  and  D(x)  and  for  the  func¬ 
tions  f*(£)  and  b(t)  appearing  in  (3.2),  we  substitute  z  (x,t)  into  the  equa¬ 
tion  (3.1)  and  obtain  a  first  order  ordinary  equation  involving  A(x)  and  D(x), 
a  second  order  ordinary  equation  for  f  (£),  and  a  first  order  ordinary  equa¬ 
tion  for  b(t).  In  the  absence  of  any  further  conditions  on  zQ(x,t),  the 
differential  relationship  between  A(x)  and  D(x)  cannot  be  uniquely  solved. 
Practical  considerations  in  a  number  of  specific  situations  required  the 
diffusion  coefficient  to  obey  a  power  law  of  the  form 


A(x)  =  dx^+^  ,  a  >  0. 

The  drift  coefficient  then  becomes 

D(x)  =apx^  +'0x,  X  <  1 ,  p  <1,  3£R. 

The  resulting  equation  for  f  (5)  has  the  particular  solution 

f*(€)  *  rT^j  5'p  exp  q  .  (a-pIO-a)'1 

and  the  function  b(t)  becomes 

I  [aO-X)2t](1"X)  1  .  ,  0  =  0  , 


b(t)  = 


-1 


[a(T-A)3_1  (1  -  exp  -  (l-X)3t)]{1_A)  0 


(3.3) 


(3.4) 


(3.5) 


(3.6) 


Mathematical  aspects  of  the  differential  equation  (3.1)  with  its  coeffic¬ 
ients  specified  by  (3.3)  and  (3.4),  which  has  been  designated  generalized 
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Feller  equation  have  been  investigated  in  a  sequence  of  papers  [9]„  [lOJ,  [11], 
[12].  The  special  types  of  the  equation  (3.1),  (3.3),  (3.4)  for  the  cases  of 
interest  in  statistics  in  connection  with  the  special  distributions  listed 

in  Section  1  have  been  given  in  [7]. 

Within  the  framework  of  this  paper  it  is  of  interest  to  note  that 

(1)  The  function  zQ(x,t)  in  (3.2)  with  f  (?)  and  b(t)  specified  in  (3.5) 
and  (3.6),  respectively,  i.e.. 


zo(x 


,t)  =' 


1- A 

ru+ql 


b~V 


exp  -? 


l-\ 


(3.7) 


is  the  delta  function  initial  condition  solution  of  (3.1),  (3.3),  (3.4),  with 
the  delta  function  applied  at  x=0,  t=0  [§].  In  other  words,  the  similarity 
solution  (3.7)  describes  the  distribution  process  governed  by  (3.1),  (3.3), 
(3.4)  from  a  completely  concentrated  initial  state  at  x=0,  t=0. 

(2)  If  we  "stop"  this  process  at  any  time  tQ>0,  we  see  that,  setting 
b(tQ)  =  b  and  comparing  (3.7)  and  (1.3),  the  function  zo(x,tQ)  becomes  the 
probability  density  function  f(x)  of  the  process  at  t  =  tQ.  This  fact  opens 
up  the  intriguing  opportunity  of  studying  the  statistical  or  probabilistic 
behavior  of  the  underlying  process  in  time  if  the  scale  parameter  b  is 
allowed  to  vary  according  to  (3.6). 

(3)  It  is  easily  seen  from  (3.6)  that  b(t)f+  °°as  t  T  +  00  if  the  drift 
parameter  3 <  0.  This  means  the  process  will  "spread  out"  over  the  entire 

-1 

positive  x-axis.  However,  if  3  >  0,  b(t)  f  [a(l-A)3  ]  >  a  finite  con¬ 

stant,  as  t  t  +  In  other  words,  the  process  approaches  a  steady  state  as 
t  t  +  oo_.with  a  finite  mean  value. 
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(4)  The  function  z  (x,t)  given  in  (3.7)  is  a  particular  delta  function 
initial  condition  solution  of  the  generalized  Feller  equation  (3.1),  (3.3), 
(3.4).  The  delta  function  initial  condition  solution  of  this  equation  with 
the  delta  function  applied  at  x  =  y >  0  and  t  =  0  is  given  by 

v*(x,t;y)r  (1-X)b"1c"(p+X)/2  (e6tn) ^  ^  Iq  ^2S(,~X)/2  (e'61!,/’  *  V  j 

x  exp(-c’-x-  (e-Btn)’'Xy  (XS) 

5  =  xb“1,  n=  yb-"1,  b  =  b(t)  given. by  (3.6),  x  >  0,  t  >  0,  q  =  (X-p) (1-X)  \ 

I  =  modified  Bessel  function  of  the  first  kind  (Bessel  function  of  imaginary 
Q 

argument).  This  fact  has  been  established  in  f9J.  (It  is  useful  in  this  con¬ 
text  to  also  consult  [11] and  [12] for  slight  notational  differences  between 
this  paper  and  [g]. 

"A" 

The  function  v  (x,t;y)  has  the  following  properties  [9]: 

(a)  v*(x,t;y)  >  0,  x  >  0,  t  >  0,  y  >  0, 

(b)  v*(x,t;y)  I  0  as  t  i  0  for  x  >  0,  y  >  0,  x  f  y, 

(c)  v*(x,t;x)  f  +  00  as  t  4-  0,  x  >0, 

(d)  v*(x,t;y)  +  z  (x,t)  as  y  4  0  for  x  >  0,  t  >  0, 

oo  * 

(e)  /  v  (x,t;y)dx  =  1 . 
o 

Clearly,  these  properties  make  the  function  v  (x,tQ;y)  a  one-sided  probability 

density  function  for  t=t  >0  and  y  >  0  fixed.  In  particular,  property  (d) 

o 

substantiates  the  claim  make  in  the  summary  that  the  family  of  distribution 
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characterized  by  (1.3)  is  a  special  case  of  the  much  more  general  family 
specified  by  v*(x;t  ;y). 

In  statistical  distribution  fitting  attempts,  in  particular  in  cases 
where  the  density  data  have  a  maximum,  one  reason  for  the  frequent  occurrence 
of  unsatisfactory  fits  results  from  the  fact  that  the  location  of  the  maximum 
of  a  distribution  candidate  cannot  be  chosen  arbitrarily.  It  is  normally 
automatically  determined  by  the  basic  parameters.  For  the  density  functions 
given  by  (1.3),  for  example,  the  maximum  is  located  at 

\  =  [-p/(1-X)]1/0  "X)b  ,  p  <  0. 

It  is  fixed  once  the  parameters  b,  p,  and  X  have  been  determined.  The  class 
of  functions  v  (x,tQ;y)  contains  the  additional  independent  "delta  function 
application  parameter"  y  .  The  presence  of  this  additional  parameter  changes 
the  situation  drastically  and  favorably.  A  thorough  discussion  of  the  class 
v  (x,tQ;y),  however,  will  not  be  attempted  here.  We  return  to  the  discussion 
of  our  main  subject. 
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4.  An  Applicability  Criterion 


Inherent  in  any  attempt  to  fit  given  empirical  distribution  data  by 
means  of  an  analytically  defined  probability  density  function  are  three 
crucial  problems,  namely  (i)  candidate  function  selection  from  a  group  of 
available  functions,  (ii)  determination  (estimation)  of  the  parameters  of  the 
selected  function,  and  (iii)  evaluation  of  the  achieved  quality  of  fit.  Since 
an  adequate  treatment  of  the  last  two  problems  requires  a  thorough  discussion 
of  the  details  of  the  numerical  techniques  involved  they  shall  be  left 
untouched  here.  This  subject  -  relative  to  the  class  of  distributions  which 
represent  the  topic  of  the  present  paper  -  will  be  picked  up  in  a  separate 
publication.  We  shall  concentrate,  therefore,  on  the  first  problem  and  pre¬ 
sent  a  general  applicability  criterion  for  the  distribution  class  defined  by 
the  cumulative  distribution  functions  (1.1)  or  by  the  associated  density 
functions  (1.3),  This  criterion  covers  all  special  cases  mentioned  in  Section 
1. 

Let  us  consider  the  distribution  function  F(x)  given  in  the  form  (1.2), 

i  .e. , 


F(x)  =  r(2+q)  ?1"P  $(1+q’  2+q;  £=  xb~']- 

Taking  logarithms,  we  obtain 

log  F (x)  =  -log  T(2+q)  +  (1-p)  log  £  +  log  $(l+q,  2+q;  -  (£)1_XY  (4.1) 
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At  this  point  it  will  be  advantageous  to  perform  the  independent  variable 
transformation  x  =  py  where  p  =  is  the  mean  (first  moment)  which  can 
easily  be  determined  from  given  empirical  data.  This  transformation  ensures 
that  all  x  data  in  the  interval  0  <  x  <  p  will  be  mapped  into  y  data  in 
the  interval  0  <  y  <  1.  This  is  important  as  will  become  apparent  momen¬ 
tarily.  Setting  then  log  F(x)  =  log  F (py )  =  v  and  log  y  =  u  so  that 

log  ¥  =  log  b/P  =  u  ”  1og  » 


we  obtain  from  (4.1)  the  functional  relation 


v ( u )  =  -logr(2+q)  -  (l-p)log  p^b  +  (l-p)u  +  log  $  (l+q,2+q; 


The  degenerate  hypergeometric  function  0  is  defined  as  a  power 
series  in  its  last  argument  with  constant  term  equal  to  unity.  Therefore, 
as  x  TO,  i.e.,  as  y  10  which  means  as  u  T  -  «>  , 

log  0^1+q,  2+q;  -  (eu/p“]b) 

Consequently,  the  function  v(u)  given  in  (4.2)  is  asymptotically  linear  in 
u  as  u  1  In  other  words. 


v(u)  ~  v-|  (u)  =  (l-p)u  -  log  r(2+q)-(l-p)  log  p"^b,  u  T  -  . 

This  asymptotic  linearity  property  may  also  be  expressed  by  saying  that,  as 
u  T  -  oo,  the  graph  of  the  function  v(u)  defined  in  (4.2)  approaches  the 
(straight  line)  asymptote  defined  by  the  linear  equation 

v -j  ( u )  =  (l-p)u  -  log  r(2+q) - ( 1  -p )  log  p“]b  .  (4.3) 
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Based  on  this  fact  we  can  formulate  the  following  Applicability  Criterion. 

A  distribution  function  F{x)  of  the  class  (1.1)  may  be  considered  as  a 
candidate  for  a  data  fit  if  the  logarithmic  plot  of  a  given  set  of  empirical 
cumulative  distribution  data  indicates  the  existence  of  an  asymptote. 


Remarks.  (1)  An  applicability  criterion  similar  to  the  one 
expressed  above  for  the  logarithm  of  the  cumulative  distribution  data  can, 
of  course,  be  formulated  for  the  corresponding  density  data  according  to 
(1.3).  Which  of  these  two  equivalent  criteria  is  actually  being  used  is 
immaterial.  The  one  given  in  terms  of  the  cumulative  data  is  generally  pre¬ 
ferred  simply  because  the  cumulative  data  are  normally  "smoother"  than  the 
corresponding  density  data. 

(2)  An  asymptotic  linearity  criterion  similar  to  the  one  expressed 
above  for  the  distribution  class  (1.1)  holds  for  the  class  of  distributions 

'k 

defined  by  the  density  function  v  (x,t  ;y)  given  in  (3.8).  This  is  easily 
seen.  If  we  denote  the  cumulative  distribution  function  associated  with 

k  . 

v  (x,tQ;y)  by  V(x),  then 


V(x)  ~  F(x)  exp  - 


(e'6tn) 


1  -X 


as  x  TO 


where  F(x)  is  given  by  (1.1).  We  shall  not  go  into  any  details  here. 

There  is  important  practical  utility  associated  with  the  applicabil¬ 
ity  criterion.  This  becomes  evident  when  we  realize  that  it  can  be  used  to 
determine  approximate  values  p^  and  X-j  for  the  two  shape  parameters  p 
and  X  .  An  approximate  value  b-j  for  the  scale  parameter  b  can  then  be 
determined  by  means  of  the  first  moment, 


b  =  y 


T(Hq) 

r(i+q+i/“0-xTT  ’ 


(4.4) 


-1 

if  we  substitute  in  q  =  ( X-p ) ( 1  - X)  the  values  p^  and  X-|  for,  p  and  X  , 

respectively. 

If  a  set  of  empirical  distribution  data  indicates  the  existence  of 
an  asymptote  for  the  logarithmic  cdf  graph,  the  location  of  the  asymptote  can 
be  estimated  either  by  visual  inspection  or  by  analytic  methods.  Numerical 
techniques  for  the  asymptote  determination  and  for  the  subsequent  estimation 
of  parameters  will  be  discussed  elsewhere.  The  location  of  the  asymptote  can 
be  specified  by  its  directional  angle  $  and  its  intersection  with  the  v-axis„ 
Since  the  asymptote  is  determined  by  the  linear  equation  (4.3).,  we  immediately 
see  that 

tan  &  =  1  -  p.  (4.5) 

This  relation  makes  it  possible  to  quickly  find  an  approximate  value  p-|  for 
the  initial  shape  parameter  p  once  $  or  tan  &  have  been  estimated, 

p-j  =  1  -  tan  . 

It  is  of  interest  to  note  that,  according  to  (4.5),  the  principal  value  of  & 
is  uniquely  determined  by  the  initial  shape  parameter  p  and  vice  versa. 

Since  p.  <1,  we  have  0  <  •9-  <  ir/2.  Some  of  the  distributions  listed  as 

special  cases  in  Section  I  have  very  specific  tan  $  values.  For  the  Gauss 
and  exponential  distributions  we  have  p  =  0  so  that  tan  S-  =  1 .  For  the 
Rayleigh  distribution  p  =  -  1  which  means  that  tan  -S-  =  2.  For  the  Maxwell 
case  p  =  -  2,  tan  $  =  3,  and  in  the  case  of  the  Wien  distribution  p  =  -  3  so 
that  tan  =  4. 

Next,  once  the  v-axis  intercept  v-|(o)  of  the  asymptote  has  approxi¬ 
mately  been  determined,  we  obtain  from  (4.3)  the  equation 

-log  r(2+<})-(l“p)log  p~^b  -  v-j(o)  =  0, 
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We  eliminate  the  unknown  scale  parameter  b  by  means  of  (4.4)  which  leads 
to  the  equality 

-log  r(2+q)-(l-p)log  r(l+q)  +  (l-p)  log  rO+q+O-x)"1 )  -  v-,(o)  =  0.  (4.6) 

The  left-hand  side  becomes  a  function  of  the  unknown  terminal  shape  parameter  x 
if  we  replace  p  by  the  previously  determined  approximate  value  p^ .  In  other 
words,  we  obtain  from  (4.6)  an.  equation  of  the  form  cp(l-x)  =  0.  It  can  be 
shown  that  it  has  exactly  one  solution  1-X-,  >  0  (provided  v^o)  has  been 
properly  determined)  which  can  easily  be  obtained  by  means  of  Newton's  method. 

The  opportunity  to  determine  "good"  approximate  values  p1  and  x 
f°r  the  shape  parameters  p  and  X  is  extremely  important  for  the  practical 
application  of  the  distribution  class  (1.1).  The  approximate  values  p]  and 
X-j  can  be  used  as  initial  values  in  any  of  the  established  parameter  estima¬ 
tion  techniques  such  as,  for  example,  the  method  of  moments  or  the  maximum- 
likelihood  method.  Each  of  these  methods  leads  to  three  equations  for  the 
unknown  parameters  b,  p,  and  x  .  Actually,  only  two  equations  are  needed 
since  the  scale  parameter  b  can  be  eliminated.  The  use  of  the  initial 
values  p-j  and  X-j  results  in  rapid  convergence  of  the  iteration  process 
which  will  lead  to  the  desired  final  parameter  values. 

Although  the  class  of  probability  distributions  discussed  in  this  paper 
has  been  known  for  more  than  sixty  years,  its  application  has  been  limited, 
most  likely  as  a  consequence  of  computational  intensity  and  possible  conver¬ 
gence  problems.  In  general,  however,  it  is  not  really  the  complexity  of  the 
system  of  transcendental  equations  which  makes  the  numerical  problem  compu¬ 
tationally  intensive  but  rather  a  poor  choice  initial  iteration  values.  It 
is  hoped  that  the  approach  presented-  here  will  lead  to  more  widespread  use 
of  the  distribution  class  (1.1). 
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5. 


Empirical  Examples 


In  the  talk  presented  at  the  Madison  conference  two  examples  based 
on  empirical  data  have  been  discussed.  As  indicated  at  the  beginning  of 
Sec.  4  a  thorough  treatment  of  practical  examples  will  not  be  attempted  in 
this  paper.  Suffice  it.  therefore,  to  simply  present  the  illustrative  docu¬ 
mentation  for  the  two  parameter  estimates. 

The  empirical  data  were  available  in  histogram  (pdf)  form  as  shown 
in  the  first  figure  of  each  of  the  two  sets  of  illustrations.  The  cdf  data 
were  obtained  by  numerical  integration.  Their  logarithmic  plots  are  shown 
in  the  second  figures,  x^  =  p  being  the  mean.  The  asymptote  data  tan  8 
and  v-j(o)  were  determined  by  visual  inspection  to  obtain  approximate  values 
p-j  and  8-|  =  1  -  *i  for  the  two  shape  parameters.  To  improve  the  numerical 
values  of  these  parameters  the  method  of  moments  was  used  which  led  to  the 
final  values  given  in  the  table.  The  scale  parameter  b  is  determined  by 
b  =  p0‘,  p  =  mean.  The  last  pair  of  figures  show  the  histograms  overlaid 

with  the  fitted  probability  density  functions. 
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CASE  *li  tfJifTSVILLE  AL/DMW  TEHP.R&IICJe* 


distribution 


Empirical  Distribution 


Mo. of  Obi. t 
Meant 

Stdd.Dev. t 
3rd.  Moffi.t 


9841 

21.378  111 

7.381  <1. 

16.582  Cl. 


(1.1192) 
Cl. 3593) 


Fitted  Biftrib, 
(lto®®nt®s  Fife) 


P* 

Beta® 

Thet&> 


•1.974 

3.584 

1.195 


CASE  #2i  HUNTSVILLE  AL,  RAINFALL  INTENSITY ,  WINTER  DISTRIB. 


^piricml  Distribution 


No. of  Obi. s 
Meant 

Stdd.D©¥« i 
3rd.  Mom. i 


®9fS 

11.736  111 

6.392  U. 2966) 
131.705  Cl. §7141 


Fitted  Dlitrlb. 
(Moments'  Fit) 

P*  -0  584 
Befem®  2.5@5 
Theta*  1.46® 


R5E  #1 :  HSV  DRILY  TEMP.  RRN6E  RNNURL  DISTRIB. 


TEMPERATURE  C33EG  F) 
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CR5E  #1 :  H5V  DRILY  TEMP.  RRNDE  RNNURL  DI5TRIB. 
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mm 


RRINFRLL  INTENSITY  WINTER 


ci3ej>  A3N3nnaHd  nay 
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CRSE  #2:  HSV  RRINFRLL  INTENSITY  WINTER  DI5TRIB. 


rase  #2:  H5V  RRINFF1LL  INTENSITY  WINTER  DX5TRXB. 


Cl 3d)  A3N3n03&J3  *13M 
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RRINFRLL  INTENSITY  (CLR55  •) 
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PLOTTING  MATHEMATICAL.  FUNCTIONS 
ON  A  STANDARD  LINE  PRINTER 


DONALD  W.  RANKIN 
Lieutenant  Colon©! 
US  Air  Fore©  Retired 


INTRODUCTION.  Often  the  analyst  will  be  greatly 
aided  if  he  can  view  a  graph  of  the  function  or 
data  under  investigation.  The  wide  availability 
of  computer-dr i van  printers  suggests  that  they  be 
adapted  to  this  usage .  However ,  since  that  is 
not  their  primary  purpose*  some  programming  is 
required  to  exact  an  acceptable  performance  from 
them.  This  paper,  then,  discusses  some  of  the 
principles  which  must  be  adhered  to  and  offers 
some  example  programs. 

No  attempt  can  be  made  to  cover  all  possible 
printer-computer  combinations,  since  their  number 
approaches  the  astronomical.  (A  recent  issue  of 
a  periodical  lists  145  low-  and  medium-priced 
printer©  from  36  different  manufacturers  which 
are  compatible  with  the  author  s  computer!) 
Instead,  a  typical  combination*,  is  put  forward 
as  an  example. 

Programming  language  will  be  confined  to  the 
most  elementary  BASIC,  so  that  even  the  casual 
programmer  will  feel  comfortable.  The  commands 
CALL,  PEEK,  and  POKE  will  not  be  used.  There  is 
little  need  for  streamlining,  since  even  a  clumsy 
program  will  run  faster  than  the  printer. 

TYPES  OF  PRINTERS.  The  principles  herein  can  be 
applied  to  virtually  all  printers,  whether  dot 
matrix,  daisy  wheel,  ink  jet  or  thermal  ribbon. 
Another  criterion  will  be  used  to  roughly  divide 
printers  into  three  categories. 

The  first  type  possesses  a  resident  plotting 
function.  For  them,  this  paper  is  not  necessary, 
although  it  may  contribute  some  insight. 

*An  Epson  model  F X-80  printer  driven  by  a  Radio  Shack  model  100 
portable  computer. 
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The  second  type  is  capable  o-f  a  variable 
reverse  line  feed.  The  principal  example  pro¬ 
gram  is  written  -far  this  type  - 

The  third  type  has  neither  of  the  above 
attributes-  As  will  be  seen,  plotting  still 
may  be  possible. 

SENDING  INFORMATION  TO  THE  LINE  PRINTER.  Most 
computers  send  intelligence  to  the  printer  as 
a  stream  of  8-bit  binary  numbers  (00000000  to 
11111111).  This  corresponds  to  0-255  (decimal) 
or  00— FF  (hexadecimal).  Some  computers  send 
only  7  bits  of  data,  reserving  the  eighth  bit 
for  a  parity  check  or  other  special  use.  They 
cannot  distinguish  Oxxxxxxx  from  ixxxxxxx.  This 
amounts  to  subtracting  128  wherever  possible. 

THE  CHARACTER-STRING  FUNCTION,  One  means  by 
which  BASIC  converts  information  into  suitable 
form  is  the  character-string  function,  which  is 
implemented  by  CHR$ ( n ) ,  where  n  can  vary  from 
0  to  255.  Values  of  n  from  32  to  127  are  used 
to  send  various  symbols,  including  punctuation, 
numbers,  and  all  the  letters  of  the  alphabet. 

For  example,  CHR$(65)  sends  a  capital  A.  Values 
from  0  to  31  are  used  to  send  instructions  to 
the  various  peripherals,  and  are  called  control 
codes.  CHR$<27)  is  called  the  ESCAPE  code.  It 
alerts  the  peripheral  that  one  or  more  binary 
numbers  are  to  follow,  and  that  the  sequence  is 
to  be  treated  as  an  entity.  By  using  ESCAPE 
sequences,  the  number  of  possible  control  codes 
becomes  almost  unlimited. 

Another  method  of  converting  to  binary  is 
to  enclose  the  actual  symbols  within  quotation 
marks.  Thus  LPRINT  “A"  and  LPRINT  CHR$(65)  are 
equivalent.  This  latter  method  depends  upon 
the  existence  of  the  appropriate  symbol ,  and 
hence  cannot  be  used  to  transmit  control  codes. 
Also  it  cannot  be  used  to  send  actual  quotation 
marks,  since  BASIC  only  recognizes  them  as  a 
sort  of  switch  which  turns  a  binary  converter 
on  and  off.  CHR$(34)  must  be  used. 

Many  software  designers  "borrow"  one  or 
more  little— used  control  codes,  diverting  them 
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ta  special  uses.  When,  in  running  a  program, 
one  of  them  occurs  by  chance,  the  result  can 
be  most  unexpected  (and  quite  unwanted).  It 
is  necessary  to  identify  these  anomalies,  so 
that  the  program  can  avoid  them. 

THE  HEX  DUMP.  The  easiest  way  to  examine  the 
information  which  the  computer  is  transmitting 
to  the  printer  is  to  perform  a  HEX  dump.  The 
printer  is  placed  in  hexadecimal  mode  and  the 
following  program  executed: 

10  FOR  N  -  0  TO  255 
20  LPRINT  CHR$ (N) ; 

30  NEXT  N 
40  END 


The 

res 

ul  t  i  n 

fg  p 

r  i  n  t 

out 

will 

i  di 

en  t  i  f 

y  the 

codes 

i  n 

ques 

t  i  on  . 

N 

ate 

the 

semi 

cal  on  at 

the 

end 

of 

1  i  n 

e  20 

It 

i  n 

h  i  b  i 

ts  ' 

the  c 

arr 

i  age 

r  et  ur 

n  . 

Fig 

ur  e 

1  gi  v 

es 

an  e 

x  araj 

pie  o 

f  a 

HEX 

dump  . 

Fi  gi 

ure  1 

Radi 

o  Shack 

Hoi 

del  1 

00  1 

HEX  Dump 

00 

01 

02 

03 

04 

05 

06 

07 

08 

20 

20 

20 

20 

20 

20 

20 

20 

0A 

0B 

OC 

OD 

OE 

OF 

10 

1  1 

12 

13 

14 

15 

16 

17 

18 

19 

IB 

1C 

ID 

IE 

IF 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

2A 

2B 

2C 

2D 

2E 

2F 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

3  A 

3B 

3C 

3D 

3E 

3F 

40 

41 

42 

43 

44 

45 

46 

47 

4B 

49 

4  A 

4B 

4C 

4D 

4E 

4F 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

5  A 

5B 

5C 

5D 

5E 

5F 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

6  A 

6  B 

6C 

6D 

6E 

6F 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

7  A 

7B 

7C 

7D 

7E 

7F 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

BA 

8B 

8C 

8D 

BE 

8F 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

9  A 

9  B 

9C 

9D 

9E 

9F 

AO 

A  1 

A2 

A3 

A4 

A5 

A6 

A  7 

A8 

A9 

AA 

AB 

AC 

AD 

AE 

AF 

B0 

B  1 

B2 

B3 

B4 

B5 

B6 

B7 

BB 

B9 

BA 

BE 

BC 

BD 

BE 

BF 

CO 

Cl 

C2 

C3  , 

C4 

C5 

C6 

C7 

C8 

C9 

CA 

CB 

CC 

CD 

CE 

CF 

DO 

Dl 

P2 

D3 

D4 

D5 

D6 

D7 

D8 

D9 

DA 

DB 

DC 

DD 

DE 

DF 

E0 

El 

E2 

E3 

E4 

E5 

E6 

E7 

E8 

E9 

EA 

EB 

EC 

ED 

EE 

EF 

FO 

FI 

F2 

F3 

F4 

F5 

F  6 

F7 

F8 

F  9 

FA 

FB 

FC 

FD 

FE 

FF 

OD 

191 


Referring  to  Figure  1,  it  can  be  seen  that 
CHR$(9)  sends  a  series  of  spaces,  while  CHR$(26) 
transmits  nothing  at  all.  It  will  be  necessary 
to  program  around  these  two  values. 

SCALING  THE  PLOT.  Some  daisy  wheel  printers 
have  adjustable  horizontal  spacing.  Dot  matrix 
printers  achieve  somewhat  the  same  effect  by 
offering  a  selection  of  type  faces.  The  ability 
to  adjust  vertical  spacing  varies  widely.  As  a 
rule  of  thumb,  assign  the  coarser  scale  factor 
to  the  independent  variable. 


F i gur  e  2 
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Type 
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Compressed 
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If  a  printer  is  capable  of  reverse  line 
feeds,  it  is  possible  to  scale  and  label  the 
plotting  area,  then  return  the  carriage  and 
platen  to  a  known  position  before  beginning 
the  actual  plot.  Without  this  capability,  it 
is  necessary  to  mark  the  paper  in  some  way  so 
that  the  platen  can  be  correctly  repositioned 
manual  1 y . 

PLANNING  A  PLOTTING  PROGRAM.  As  an  exercise, 
let  us  write  a  program  which  plots  two  functions 
simultaneously,  using  different  plotting  symbols 
for  each,  so  that  they  may  be  distinguished. 

Let  us  assume  a  dot  matrix  printer  capable  of 
compressed  type  face  and  variable  reverse  line 
feed.  Further  let  us  assume  a  computer  which 
diverts  09  and  1A  (hex)  to  special  uses.  (We 
recall  that  these  codes  are  generated  by  CHR*:(9) 
and  CHR$(26),  respectively.)  Available  plotting 
area  is  6  by  6  inches. 
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To  generate  variable  line  feeds,  the  ESCAPE 
sequences  lB;4A;nn  (forward)  and  lB;6A;nn  (re¬ 
verse)  are  used,  where  nn  can  vary  from  OO  to 
FF  (except  09  and  1A,  of  course).  BASIC  uses 
CHR$ (27) CHR$ (74) CHR$ (  N  )  and  CHR# (27) CHR$ ( 106) 
CHR$(N)  to  send  these  sequences  (N  =  O  to  255) . 
Since  symbols  exist  for  CHR$(74)  and  CHR$(106) , 
the  shorter  forms  CHR$(27)"il"CHR±-(N)  and  C  H  R  T 
(27) "  j " CHR$ ( N  >  can  be  used.  Same  computers  may 
require  semicolons  between  the  parts.  For  the 
printer  which  was  employed,  a  value  of  N  =  255 
moves  the  platen  exactly  3  cm.  Thus  there  are 
85  machine  counts  per  cm.,  or  216  per  inch. 

By  using  the  compressed  type  face  for  plot¬ 
ting,  we  find  27  machine  counts  per  4  cm.,  or 
103  per  6  inches.  It  is  apparent  at  once  that 
the  independent  variable  should  vary  in  the 
horizontal  direction. 

For  an  example  plot,  choose  the  tangent 
and  cosine  functions  through  the  range  from  0 
to  240  degrees,  inclusive.  Assigning  a  scale 
factor  of  2.5  degrees  per  character,  the  plot 
will  be  97  characters  wide  (compressed),  which 
leaves  a  few  for  labelling.  The  computer  re¬ 
quires  that  the  argument  be  expressed  in  rad¬ 
ians,  so  that  one  character  is  equivalent  to 
0.0436332313  radians.  Successive  values  of  the 
functions  are  computed  by  a  routine  similar  to: 

10  FOR  X  =  0  TO  96 

20  C  =  COS (0. 0436332313  *  X) 

30  IF  ABS(C)  <  0.3  THEN  50 
40  T  =  TAN (0. 0436332313  *  X) 

50  NEXT.  X 

Line  30  is  not  essential.  It  merely  avoids 
computing  large  values  of  the  tangent  which 
would  not  be  plotted  anyway. 

For  the  vertical  scale,  let  us  choose 
unity  to  be  1.25  inches.  Now  the  ordinates 
can  be  easily  read  with  a  common  foot  ruler, 
since  0.1  =  1  /  B  .  Multiplying  216  by  1.25, 
it  is  found  that  there  are  270  machine  counts 
per  unit  on  the  vertical  axis.  Values  to  about 
±2.15  can  be  displayed  within  the  allotted  area. 
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The  number  to  be  converted  to  binary  by  the 
character-string  function  must  be  an  integer. 
This  can  be  accomplished  by: 

Y  =  INT  < . 5  +  270  *  C) 

Computed  in  this  way,  there  is  no  need  to  worry 
about  sign. 

There  is  one  more  paint  to  consider.  The 
program  must  be  given  a  memory.  It  is  vital 
that  it  be  able  to  "remember*1  the  position  of 
the  platen.  For  this  express  purpose,  the 
variable  LY  (last  Y)  is  established. 

Below  is  an  example  program,  followed  by 
explanatory  notes. 


ASAMPLE  PLOTTING  PROGRAM. 


2790  END 

3000  LPRINT  CHR* (27) " 1 "CHR$ ( 10) : LPRINT  CHR$ ( 27 ) "A" 
CHR$  ( 8 ) 

3010  J$  =  "!  I  j  i 


i " : LPR I NT  CHR$ < 15) ; J$: LPRINT  J* 

3020  FOR  N7.  =  1  TO  22 

3030  LPRINT  " S " ; TAB (36) ; “ S " ; TAB (72) ; " ! " 

3040  NEXT  N7. 

3050  LPRINT  J$ 

3060  FOR  N7.  =  1  TO  22 

3070  LPRINT  "  !  " ; T AB ( 36 >  ;  "  i  “ ; T AB ( 72 )  ;  "  !  " 

3080  NEXT  N7. 

3090  LPRINT  J$:LPRINT  J$ 

3100  LPRINT  CHR$ ( 18)  ;  " 0 " ; TAB <  7  >  ;  "30" ; TAB ( 14)  ;  "60"  ; 
TAB (21 ) ; "90";TAB(28> ; " 120" ; TAB (35> ; " 150" ; TAB (42) ; 

" 180" 5 TAB (49)  ;  " 2 1 0 " ; T AB ( 56 )  ;  " 24 0 " : LPR I  NT 

3110  LPRINT  TAB (26) ; "Degrees" ; CHR$ ( 15) 5 CHR$ (27) " A" 

CHR$(0) 

3120  K$  =  " - - - 


3130  LPRINT  CHR# (27)  " j "CHR$ ( 10B)  ; ; CHR$ ( IB) ;  "-2" 5 
CHR# (15) 

3140  LPRINT  CHR$ (27) " j "CHR* < 135) ; CHR$ (27) " j "CHR$ ( 
135)?  K$; CHR$ ( 18)  5  " - 1 " ; CHR$ (15) 

3150  LPRINT  CHR$ ( 27 ) " j" CHR$(135> 5  CHR$ ( 27 ) "  j " CHR$ ( 
135)5  K$; CHR$ ( 18)  5  "  0" 5  CHR$ (15) 
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3160  LPRINT  CHR$ (27) " j "CHR$ ( 135) ; CHR$ (27) “ j ”CHR$ ( 

135) ; K$ ; CHR$ ( 18)  ? "  1 “ 5  CHR$  < 15) 

3170  LPRINT  CHR$(27)  11  j"CHR$(135)  ;CHR$(27)  "  j  "CHR$  ( 

135) 5 K$; CHR$ ( 18) ; "  2“ 

3180  LPRINT  CHR$ (27) " J "CHR$ (72) ; TAB <31 ) ; "Figure  3" 
3190  LPRINT  CHR$ (27)  " J "CHR* (72)  5  TAB (30)  ;  "  +  y  ==  cos  x" 

3200  LPRINT  CHR$  (  27  )  "  J  '*  CHR$  (  54  )  ;  T  AB  (  30  )  ;  "  *y  -  tan  x  " 

; CHR^ (15) 

3210  LPRINT  CHR$  (27)  *'  J  "CHR$  (  171  )  5  CHR$  (27)  "  J  "CHR$  (  171  ) 
3220  Y  =  O 

3230  FOR  X  =  O  TO  96 

3240  C  =  270  «■  COS (O. 0436332313  *  X)  :  T  =999 

3250  IF  ABS(C)  <  99  THEN  3270 

3260  T  =  270  *  T AN (O . 04363323 1 3  *  X) 

3270  IF  ABS(T)  >  580  THEN  3460 
3280  LY  =  Y  :  Y  =  INT(.5  +  T) 

3290  IF  Y  >  LY  THEN  3390 

3300  IF  Y  <  LY  THEN  3320 

3310  LPRINT  ; CHR$ (8) ; :  SOTO  3460 

3320  IF  (LY  -  Y)  <  256  THEN  3340 

3330  LY  =  LY  -  255  :  LPRINT  CHR$ (27) “ J “CHR$ (255) ; : 
GOTO  3320 

3340  IF  (LY  -  Y)  =  26  THEN  3370 
3350  IF  (LY  -  Y)  =9  THEN  3380 

3360  LPRINT  CHR# ( 27 )  " J" CHR$(LY-Y)  ;  "  *  "  ; CHRi ( 8 )  ;  ;  GO 
TO  3460 

3370  LPRINT  CHR$  <  27  )  11 J  "  CHR$  (  1  3  )  ;  CHR$  (  27  )  ?‘ J  "  CHR$  (  1 3  ) 

5  "  *'*  ;  CHR$  (8)  ;  :  GOTO  3460 

3380  LPRINT  CHR$  (  27  )  “  J  11  CHR$  (  4  )  5  CHR$  (  27  )  "  J  "  CHR$  (  5  )  ; 

" * " CHR$ ( 8 ) ; :  GOTO  3460 

3390  IF  (Y-LY)  <  256  THEN  3410 

3400  LY  =  LY  +  255  s  LPRINT  CHR$ ( 27 ) " j " CHR$ ( 255 ) 5 : 
GOTO  3390 

3410  IF  (Y-LY)  =26  THEN  3440 
3420  IF  (Y-LY)  =  9  THEN  3450 

3430  LPRINT  CHR$  (27  >"  j ''CHR$  (Y-LY)  ;  ;  CHR$  (8)  ;  :  GO 

TO  3460 

3440  LPRINT  CHR$ ( 27 >  " j " CHR$ ( 1 3 )  ; CHR$ ( 27 )  " j  "  CHR$ ( 1 3 ) 

; ; CHR$ (B) ; :  GOTO  3460 

3450  LPRINT  CHR$ ( 27 ) " j " CHR$ ( 4 > ; CHR$ ( 27 ) " j “ CHR$ ( 5 ) ; 

"  *  ;  C  H  R  $  (  B  )  ; 

3460  LY  =  Y  :  Y  =  INT ( . 5  +  C) 

3470  IF  LY  >  Y  THEN  3500 
3480  IF  LY  <  Y  THEN  3570 
3490  LPRINT  " + " ; :  GOTO  3640 
3500  IF  (LY-Y)  <  256  THEN  3520 

3510  LPRINT  CHR$(27) " J" CHR$(255) ; s  LY  =  LY  -  255  : 
GOTO  3500 
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3520  IF  <  LY- Y )  =  26  THEN  3550 

.3530  IF  <  LY“ Y )  =  9  THEN  3560 

3540  LPRINT  CHR$ ( 27 ) " J " CHR$ ( L Y— Y > ; " + " ; s  GOTO  3640 
3550  LPRINT  CHR* ( 27 ) " J " CHR$ ( 1 3 ) ; CHR'* ( 27 ) " J “ CHR* ( 1 3 ) 
;  "  +  "  ;  :  GOTO  3640 

3560  LPRINT  CHR$ < 27 > " J “ CHR$ ( 4 > ; CHR$ < 27 > " J " CHR$ ( 5 ) s 
; s  GOTO  3640 

3570  IF  (Y-LY)  <  256  THEN  3590 

3580  LPRINT  CHR$ < 27 ) " j " CHR* ( 255 ) ; s  LY  =  LY  +  255  : 
GOTO  3570 

3590  IF  (Y-LY)  =  26  THEN  3620 
3600  IF  (Y-LY)  =  9  THEN  3630 

3610  LPRINT  CHR$(27) "j" CHR$(Y-LY) $ " *  GOTO  3640 
3620  LPRINT  CHR$ ( 27 ) " j " CHR* < 1 3 ) ; CHR$ < 27 > ” j “ CHR$ ( 1 3 > 
;  ■■  +  "  ;  »  GOTO  3640 

3630  LPRINT  CHR$ < 27 > " j " CHR$ ( 4 ) ; CHR$ < 27 ) " j " CHR$ < 5 > ; 

•I  +  ii  . 

'  J 

3640  NEXT  X 

3650  LY  =  Y  s  Y  =  -720 

3660  IF  (LY-Y)  <  256  THEN  3680 

3670  LY  =  LY  -  255  i  LPRINT  CHR$ ( 27 ) " J " CHR$ ( 255 )  : 

GOTO  3660 

3680  IF  (LY-Y)  =  26  THEN  3710 
3690  IF  (LY-Y)  =  9  THEN  3720 

3700  LPRINT  CHR$ ( 27 ) "J" CHR$(LY— Y)  :  GOTO  3730 
3710  LPRINT  CHR$ (27 ) " J "CHR$ ( 13) ; CHR$ (27) “ J " CHR* ( 13) 
s  GOTO  3730 

3720  LPRINT  CHR$ ( 27 ) " J " CHR$ ( 4 ) ; CHR$ ( 27 ) " J “ CHR$ ( 5 ) 
3730  LPRINT  CHR$ (27) "2" ; CHR# ( 18) 

3740  RETURN 


NOTES  ON  THE  PROGRAM. 

Line  3000.  Sets  left  margin  to  1.25  in.  Sets 
line  feed  to  1/9  in.  for  cosmetic  purposes. 
The  colon  is  used  to  separate  statements  on 
the  same  numbered  line.  Some  computers  may 
require  a  different  symbol. 

Lines  3010-3090.  Plots  the  vertical  grid. 

CHR$(15)  calls  up  the  compressed  type  face. 
The  string  variable  must  contain  a  count 

of  11  spaces  between  each  symbol  "i". 

The  symbol  is  generated  by  CHR$(124),  or  it 
can  be  reached  from  the  keyboard  with  the 
keystrokes  <SHIFTXGRPH><->. 
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Lines  3100-3200.  Plots  the  horizontal  grid, 

labelling  as  it  goes.  Note  that  the  plot  is 
in  Compressed  type  face,  but  the  labelling 
is  done  in  Pica.  The  width  ratio  is  7: 12. 
CHR$(1B)  restores  Pica. 

Line  3110.  CHR*(15>  calls  Tor  the  Compressed 

type  face.  C  H  R  $  (  2  7  ) "  A  "  C  H  R  $  (  0  )  kills  the 
line  Teed  associated  with  a  carriage  return. 

Lines  3210-3220.  The  platen,  carriage,  and 
dependent  variable  are  zeroed. 

Lines  3230-3640.  Computation  and  plotting  are 
accomplished  by  means  oT  a  FOR-NEXT  loop. 

Line  3270.  This  places  a  limit  an  the  values 
which  will  be  plotted.  Without  this  limit, 
the  program  might  attempt  to  plot  a  point  oTT 
the  paper,  thereby  jamming  the  paper  under 
the  pi aten . 

Line  3310.  CHR$<8)  generates  a  backspace.  The 

trailing  semicolon  inhibits  the  carriage 
r eturn . 

Lines  3320-3330.  Moves  platen  in  steps  oT  3 
cm.  when  required. 

Lines  3340-3380.  Moves  the  platen  and  plots  the 
point,  avoiding  the  problem  codes  9  and  26. 
This  pattern  is  repeated  three  times  (two 
Tunctians,  two  signs). 

Lines  3650-3720.  The  platen  is  moved  to  the 

bottom  oT  the  plot,  in  position  Tor  Tollowing 
text. 

Line  3730.  Restores  normal  line  Teed  and  Pica 
type  Tace. 

Line  3740.  IT  the  program  is  not  used  as  a 

sub-routine,  substitute  "END"  or  " G0T&  nnn". 

Figure  3  illustrates  the  program  exercised. 
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PLOTTING  WITHOUT  VARIABLE  REVERSE  LINE  FEED. 
For  printers  which  lack  the  desired  -functions, 
t  may  suffice  to  us@  minimum  line  feeds  to 
increment  the  independent  variable,  and  the 
TAB  function  to  plot  the  dependent  variable. 
Begin  by  printing  a  horizontal  line,  which 
is  used  as  a  reference  mark  for  aligning  the 
paper  with  the  printer's  paper  guide  bar. 

The  results  of  such  a  technique  are  shown 
in  Figure  4. 
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5  END 
10  LPRINT 


II 


15  LPRINT  i  LPRINT  I  LPRINT  i  LPRINT  i  LPRINT 
20  END 

105  K*  -  "  1 

i 

«  II 
I 

110  LPRINT  CHR* (15) | CHR* (27) "1"CHR$(17) 

115  LPRINT  J*jCHR*(lB> | " 0"  j CHR*  < 15) 

120  FOR  N  -  1  TO  19 

125  LPRINT  K*  a  NEXT  N 

130  LPRINT  J*| CHR* < 10) i "90" j CHR* (15) 

135  FOR  N  -  1  TO  19 
140  LPRINT  K*  a  NEXT  N 

145  LPRINT  J*jCHR*(18>  j  "  1 80  11 1  CHR#  (15) 

130  LPRINT  K*|CHR*< 10) 

155  LPRINT  "  .  -1 " ; TAB (26) j  "0" ? TAB (49) * "  +  1 " 

160  END 

200  LPRINT  CHR*  ( 15)  |  CHR* (27)  *'l  "CHR*  <  17) 

210  FOR  N  «  0  TO  41 

220  S  «  I NT ( «  5  +  40  *  8  I N < 0 . 0705398 1 634  *  N>> 
230  LPRINT  TAB (45  +  8  >  |  "  +  "  I  NEXT  N 
240  END 

300  LPRINT  a  LPRINT  I  LPRINT  i  LPRINT  t  LPRINT 

i  LPRINT  a  LPRINT  t  LPRINT 

310  TAB (21 ) | "Figure  4"  a  LPRINT 

320  LPRINT  T AB ( 20 )  j  "  x  «  sin  y 

33Q  END 


T a  draw  the  reference  line,  execute  <RUN  10>. 
Then  turn  the  printer  off  and  manually  position 
the  platen,  using  the  reference  line. 

Turn  the  printer  on  and  execute  <RUN  10Q>. 
Turn  the  printer  off  and  reposition  as  before. 

Repeat  the  procedure  executing  <RUN  200>, 
Repeat  again  using  <RUN  300>.  The  resulting 
plot  will  be  similar  to  Figure  4. 
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STATISTICAL  COMPARISON  OF  THE  ABILITY  OF 
CAMOUFLAGE  COLORS  TO  BLEND  WITH  TERRAIN  BACKGROUND 
UNDER  HIGH  AND  LOW  SUN  ANGLES 

George  Anitole  and  Ronald  I.  Johnson 
U.S.  Army  Belvoir  Research  and  Development  Center 
Fort  Belvoir,  Virginia  22060 

Christopher  J.  Neubert 
U.S.  Army  Engineer  School 
Fort  Belvoir,  Virginia  22060 

ABSTRACT 

This  study  determined  the  effect  of  sunlight  angle  upon  the 
effectiveness  of  camouflage  colors  to  blend  with  desert  backgrounds. 

Eleven  U.S.  Marine  personnel  and  two  civilians  subjectively  evaluated  ten 
colors  at  nine  desert  sites,  under  high  and  low  sunlight  angles.  The  best 
six  colors  were  rated  on  a  six  point  scale,  with  the  value  number  one  most 
effective,  and  number  six  not  effective.  An  analysis  of  variance  was 
performed  for  each  site  and  all  nine  sites  combined  to  determine  signifi¬ 
cant  (  a  =  0.05)  differences  between  the  best  four  colors.  Tukey’s 
Studentized  Range  Test  for  Variable  Ratings  identified  which  of  the  four 
colors  differed  significantly  (a  =  0.05)  from  each  other.  Slight 
differences  were  found  in  the  ranking  of  the  colors.  This  eliminates  the 
requirements  for  low  angle  sunlight  data. 

1.0  SECTION  1  -  Introduction 

This  Center  started  its  current  desert  color  evaluations  in  April 
1980,  when  the  Project  Manager,  Saudi  Arabian  National  Guard  (SANG) 
Modernization  requested  camouflage  for  SANG.  Field  color  evaluations  have 
been  conducted  in  Saudi  Arabia  and  the  United  States  desert  southwest. 
During  these  studies  it  was  noted  that  the  camouflage  colors  became 
brighter  in  hue  when  subjected  to  low  sunlight  angles  in  the  early  morning 
or  late  afternoon.  This  observation  led  to  the  question  -  what  effects  do 
high  and  low  sunlight  angles  have  upon  the  judgment  of  how  well  camouflage 
colors  blend  with  the  desert  background?  This  paper  presents  the  results 
of  a  study  conducted  in  the  United  States  deserts  designed  to  answer  the 
above  question.  It  should  be  noted  that  if  testing  is  required  under  both 
high  and  low  sunlight  angles,  the  costs  and  time  to  run  the  study  were 
about  doubled.  If  evaluations  can  be  completed  using  one  sunlight  angle, 
the  high  sunlight  angle  would  be  tested  rather  than  the  low  sunlight  angle, 
because  of  its  much  longer  time  duration  in  the  course  of  a  day. 

2.0  SECTION  2  -  Experimental  Design 

2*1  Camouflage  Colors 

With  the  exception  of  the  paint  colors  Gun  Metal  Gray  and  Egyptian, 
all  the  colors  studied  were  taken  from  the  SANG  color  test  palette.  These 
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colors  were  developed  over  a  two-year  period,  and  they  represent  the  most 
sophisticated  available  to  determine  camouflage  effectiveness  for  a  series 
of  selected  different  desert  sites..  The  Gun  Metal  Gray  color  was  selected 
to  provide  high  color  contrast  (in  patterns).  The  Egyptian  color  is  the 
paint  currently  being  used  to  camouflage  Egyptian  equipment.  Two  new  paints 
derived  from  the  Saudi  Arabian  desert  color  palette  were  colors  W  and  X. 
Color  W  is  a  fifty-fifty  mix  of  colors  7  and  8*,  while  X  is  color  11  with 
the  addition  of  black  paint.  All  paints  were  lusterless  with  a  reflectance 
of  1 %  at  a  60°  angle. 

2.2  Test  Targets 

The  test  targets  used  for  this  study  had  to  be  highly  mobile  and  large 
enough  to  permit  a  study  of  the  target  with  various  desert  backgrounds. 

The  U.S.  Marine  Corps  made  available  ten  Commercial  Utility  Cargo  Vehicle 
(CUCV)  trucks  which  were  painted  and  coded  according  to  Table  1.  Each 
truck  was  painted  on  the  basis  of  a  three  color  pattern  and  are  identified 
as  colors  1,  2,  and  3.  For  monotones  and  two  color  patterns,  one  or  more 
color  is  repeated. 

2 . 3  Test  Sites 

A  total  of  nine  sites  were  selected  for  this  study.  All  the  desert 
sites  contained  sparse  vegetation  similar  to  that  found  in  Saudi  Arabia. 

The  soil  ranged  in  color  from  a  light  buff/tan  to  gray  and  dark  brown,  and 

TABLE  1 

CUCV  Truck  Colors 


Color 

Vehicle  Number  1 

2 

3 

A 

3 

3 

3 

B 

5 

3 

1 

C 

7 

E* 

8 

D 

7 

8 

8 

F 

11 

11 

11 

G 

Gun  Metal  Gray 

3 

5 

H 

8 

8 

8 

I 

10 

10 

10 

'  "  W 

7/8 

7/8 

7/8 

X 

AC11 

AC11 

AC11 

*  Egyptian  Color 

represented  a  good 

cross-sectional 

spectrum  of 

different  colored  desert 

backgrounds.  For  example,  one  site 

on  Midland 

Road,  Blythe,  California 

had  a  reddish  color,  while  the  site  at  the  Baker,  California,  dry  lake  was 
dark  brown.  The  site  at  Jean  Dry  Lake  bed  off  Route  15  in  Nevada  was 
somewhat  yellow  in  appearance.  The  order  of  the  nine  sites  as  they  will 
appear  throughout  this  study  is  seen  in  Table  2. 


*numerical  designations  were  assigned  to  colors  during  prior  field  tests 
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TABLE  2 


Site  £ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.4  Test 

The  test  subjects  consisted  of  eleven  U.S.  Marine  Corps  enlisted  men 
and  two  civilian  employees  from  the  Countersurveillance  and  Deception 
Division,  Fort  Belvoir,  Virginia.  The  enlisted  personnel  belonged  to  the 
1st  Marine  Amphibious  FORCE  Service  Support  Group,  Camp  Pendleton, 
California.  Thus,  each  ground  observation  consisted  of  a  sample  size  of 
thirteen.  Each  subject  had  at  least  a  corrected  visual  acuity  of  20/30  and 
normal  color  vision. 

2.5  Data  Generation 

The  object  of  this  study  was  to  determine  what  effects  high  and  low 
sunlight  angles  have  on  the  ability  of  camouflage  paint  colors  to  blend 
with  desert  backgrounds.  The  relative  rating  of  these  colors  under  the  two 
sunlight  conditions  was  compared  to  determine  significant  differences. 

The  ten  trucks  were  painted  as  shown  in  Table  1.  The  trucks  were  divided 
into  the  following  two  groups: 

A  B  C  F  W 
G  H  I  D  X 

By  using  this  division,  two  of  the  patterned  trucks  appeared  in  each  of  the 
two  groups  along  with  three  monotones.  The  ground  observers  (13)  were 
asked  to  select  three  color  combinations  from  each  of  the  two  groups,  based 
upon  their  subjective  judgment  in  the  colors  ability  to  blend  the  CUCV 
trucks  with  the  desert  background. 

The  next  task  was  to  rank  the  remaining  six  colors  on  their  ability  to 
blend  with  the  desert  background  using  the  following  ranking  system: 

1  -  Most  effective 

2  -  Very  effective 

3  -  Effective 

4  -  Somewhat  effective 

5  -  Less  effective 

6  -  Not  effective 


Site  Order  Identification 


Color 

Location 

Buff 

Yuma  Sand  Dunes,  AZ 

Light  Gray 

Ogilby  Road,  CA 

Gray-Tan 

Baker  Sand  Dunes,  CA 

Light  Buff/Tan 

29  Palms,  Range  111,  CA 

Light  Tan 

29  Palms,  Tank  Trail,  CA 

Reddish  Tan 

Midland  Road,  Blythe,  CA 

Yellow-Tan 

Jean  Dry  Lake  Bed, 

Las  Vegas,  NV 

Brown 

Dry  Lake  Bed,  Baker,  CA 

Dark  Tan 

Salton  Sea,  CA 

ects 
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No  ties  were  allowed.  Each  of  the  six  colored  trucks  was  assigned  a 
number.  A  value  of  7  was  assigned  for  all  colors  not  selected  for  final 
ranking  by  the  ground  observers. 

3.0  SECTION  3  -  Results 

The  results  of  each  site  for  both  the  high  and  low  sunlight  angles 
will  not  be  included  because  it  would  be  too  voluminous  to  present  in  these 
proceedings.  A  summary  of  the  four  best  colors  for  each  site  under  high 
and  low  sunlight  angels  is  included  in  the  discussion  section.  This  data 
is  available  upon  request  from  the  U.S.  Army  Belvoir  Research  and 
Development  Center,  ATTN:  STRBE-JDS,  Fort  Belvoir,  VA  22060.  Tables  3-5 
and  Figure  1  show  the  data  and  data  analysis  averaged  across  all  nine 
sites  for  the  high  sunlight  angle.  Tables  6-8  and  Figure  2  show  the  data 
and  data  analysis  averaged  across  all  nine  sites  for  the  low  sunlight 
angle.  Table  9-11  and  Figure  3  show  the  data  and  data  analysis  for  the 
combined  high  and  low  sunlight  angles  to  determine  what  effects  high  and 
low  sunlight  angles  had  upon  the  camouflage  colors  in  their  ability  to 
blend  with  the  desert  background. 


TABLE  3 


Descriptive  Data  for  CUCV  Truck  Color  Blend  with  Desert 
Background,  Averaged  Across  All  Sites,  High  Sunlight  Angle 


STD  ERROR 

95%  CONFIDENCE 

INTERVAL 

COLOR 

N 

MEAN 

OF  COL  MEAN 

LOWER  LIMIT 

UPPER  LIMIT 

A 

117 

5.76923 

0.218300 

5.34136 

6.19710 

B 

117 

6.27350 

0.150461 

5.97860 

6.56841 

C 

117 

4.76923 

0.158920 

4.45775 

5.08071 

D 

117 

3.83761 

0.124956 

3.59269 

4.08252 

F 

117 

4.28205 

0.146099 

3.99570 

4.56840 

G 

117 

7.00000 

0.000000 

7.00000 

7.00000 

H 

117 

3.82051 

0.142922 

3.54039 

4.10064 

I 

117 

6.70940 

0.088425 

6.53609 

6.88272 

W 

117 

3.60684 

0.217140 

3.18124 

4.03243 

X 

117 

2.92308 

0.190843 

2.54902 

3.29713 

TABLE  4 


Analysis  of  Variance  for  the  Best  Four  Color  Blends, 
Averaged  Across  All  Sites,  High  Sunlight  Angle 


SOURCE 

DF 

SUM  OF  SQUARES 

MEAN  SQUARE 

F  VALUE 

PR>F 

COLOR 

3 

64.59829060 

21.53276353 

6.15 

0.0005 

ERROR 

464 

1623.36752137 

3.49863690 

TOTAL 

467 

1687.96581197 

Table  4  indicates  that  there  are  significant  differences  in  the  ability  of 
the  top  four  colors  to  blend  with  the  desert  background.  These  differences 
are  shown  in  Table  5. 
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TABLE  5 


Significant  Differences  Between  the  Top  Four  Camouflage  Colors 
(Blend) ,  Averaged  Across  All  Sites,  High  Sunlight  Angle 


TUKEY  GROUPING 

MEAN  -  . 

N 

COLORS 

A 

3.8376 

117 

D 

A 

3.8205 

117 

H 

A 

3.6068 

117 

W 

B 

2.9231 

117 

X 

a  -  0.05,  Degrees  of  Freedom  =  464 
Critical  Value  of  Studentized  Range  =  3,646 
Minimum  Significant  Difference  =  0.630546 

Color  means  with  the  same  letter  in  the  grouping  column  are  not 
significantly  different. 


TABLE  6 


Descriptive  Data  for  CUCV  Truck  Color  Blend  with  Desert 


Background 

,  Averaged  Across  All  Sites, 

Low  Sunlight  Angle 

STD  ERROR 

95%  CONFIDENCE  INTERVAL 

COLOR 

N 

MEAN 

OF  COL  MEAN 

LOWER  LIMIT 

UPPER  LIMIT 

A 

117 

5.76923 

0.191703 

5.39349 

6.14497 

B 

117 

7.00000 

0.000000 

7.00000 

7.00000 

C 

117 

5.31624 

0.141385 

5.03913 

5.59335 

D 

117 

3.81197 

0.107850 

3.60058 

4.02335 

F 

117 

4.21368 

0.152988 

3.91382 

4.51353 

.  G 

117  . 

7.00000 

0.000000 

7.00000 

7.00000 

H 

117 

4.18803 

0.139961 

3.91371 

4.46236 

I 

117 

7.00000 

0.000000 

7.00000 

7.00000 

IV 

117 

2.19658 

0.144268 

1.91382 

2.47935 

X 

117 

2.50427 

0.137675 

2.23443 

2.77412 

TABLE  7 

Analysis  of  Variance  for  the  Best  Four  Color  Blends, 
Averaged  Across  All  Sites,  Low  Sunlight  Angle 


SOURCE 

DF 

SUM  OF  SQUARES; 

MEAN  SQUARE 

F  VALUE 

PR>F 

COLOR 

3 

332.17948718 

110.72649573 

53.33 

0.0001 

ERROR 

464 

963.45299145 

2.07640731 

TOTAL 

467 

•  1295.63247863 

Table  7  indicates  that  there  are  significant  differences  in  the  ability  of 
the  top  four  colors  to  blend  with  the  desert  background.  These  differences 
are  shown  in  Table  8. 
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Figure  2  —  Camouflage  Color  Ability  to  Blend 

Background ,  Averaged  Across  All  Nine  Sites,  Lc 


X 


.  Desert 

e 


TABLE  8 


Significant  Differences  Between  the  Top  Four  Camouflage  Colors 
(Blend),  Averaged  Across  All  Sites,  Low  Sunlight  Angle 


TUKEY  GROUPING 

MEAN 

N 

COLORS 

A 

4.1880 

117 

H 

A 

3.8120 

117 

D 

B 

2.5043 

117 

X 

B 

2.1966 

117 

w 

a  =  0.05,  Degrees  of  Freedom  =  464 
Critical  Value  of  Studentized  Range  =  3.646 
Minimum  Significant  Difference  =  0.485762 

Color  means  with  the  same  letter  in  the  grouping  column  are  not 
significantly  different. 

TABLE  9 

Descriptive  Data  for  CUCV  Truck  Color  Blend  with  Desert  Background, 
Averaged  Across  All  Sites,  High  and  low  Sunlight  Angles 


STD  ERROR  95%  CONFIDENCE  INTERVAL 


COLOR 

_N 

MEAN 

OF  COL  MEAN 

LOWER  LIMIT 

UPPER  LIMIT 

A 

234 

5.76923 

0.14495 

5.48513 

6.05333 

B 

234 

6.63675 

0.07875 

6.48240 

6.79110 

C 

234 

5.04701 

0.10719 

4.82691 

5.25711 

D 

234 

3.82479 

0.08236 

3.66336 

3.98621 

F 

234 

4.24786 

0.10557 

4.04091 

4.45474 

G 

234 

7.00000 

0.00000 

7.00000 

7.00000 

H 

234 

4.00427 

0.10053 

3.80724 

4.20131 

I 

234 

6.85470 

0.04513 

6.76624 

6.94316 

W 

234 

2.90171 

0.13803 

2.63117 

3.17224 

X 

234 

2.71368 

0.11821 

2.48199 

2.83188 

TABLE 

10 

Analysis  of 

Variance  for  the 

Best 

Four  Color 

Blends , 

Averaged  Acros 

s  All  Sites,  High  and 

low  Sunlight  Angles 

SOURCE 

DF 

SUM  OF  SQUARES 

MEAN  SQUARE 

F  VALUE 

PR>F 

COLOR 

3 

294.58 

98.19 

33.63 

0.0001 

ERROR 

932 

2721.37 

2.92 

TOTAL 

935 

3015.95 

Table  10  indicates  that  there  are  significant  differences  in  the  ability  of 
the  top  four  colors  to  blend  with  the  desert  background.  These  differences 
are  shown  in  Table  11. 
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Figure  3  -  Camouflage  Color  Ability  to  Blend  CUCV  Truck  with  Desert 
Background,  Averaged  Across  All  Nine  Sites,  High  and  Low  Sunlight  Angles 


TABLE  11 


Significant  Differences  Between  the  Top  Four  Camouflage  Colors  (Blend), 
Averaged  Across  All  Sites,  High  and  Low  Sunlight  Angles 


TUKEY  GROUPING 

MEAN 

N 

COLORS 

A 

4.00427 

234 

H 

A 

3.82479 

234 

D 

B 

2.90171 

234 

W 

B 

2.71368 

234 

X 

a  -  0.05,  Degrees  of  Freedom  =  932 
Critical  Value  of  Studentized  Range  =  3.764 
Minimum  Significant  Difference  =  0.226501 

Color  means  with  the  same  letter  in  the  grouping  column  are  not 
significantly  different. 

4.0  SECTION  4  -  Discussion 

The  purpose  of  this  study  was  to  determine  if  high  and  low  sunlight 
angles  had  a  significant  effect  on  the  ability  of  the  top  four  camouflage 
colors  to  blend  with  the  desert  background.  Tables  3-5  and  Figure  1 
indicate  the  ability  of  each  of  the  ten  colors  evaluated  to  blend  with  the 
desert  terrain  when  averaged  areas  all  nine  sites  for  a  high  sunlight 
angle.  Tables  6-8  and  Figure  2  is  a  repeat  of  the  ability  of  the  ten 
camouflage  paint  colors  to  blend  with  the  terrain,  only  this  time  the  data 
was  taken  under  low  sunlight  conditions.  A  look  at  these  figures  and 
tables  indicates  that  the  conditions  of  high  and  low  sunlight  angles  do 
affect  the  utility  of  some  of  the  camouflage  colors  to  blend  with  the 
desert  terrain.  Table  12  shows  the  best  four  camouflage  colors  for  each 
6ite  and  when  averaged  across  all  nine  sites  for  high  and  low  sunlight 
angles.  For  each  of  the  two  sunlight  angles,  the  least  to  most  effective 
colors  for  blend  are  read  left  to  right.  Thus,  there  are  differences  in 
the  best  four  colors  when  comparing  separately  each  of  the  nine  sites. 

TABLE  12 


Site 

1 

2 

3 

4 

5 

6 

7 

8 
9 

All 


Summary  of  the  Best  Four  Color  Blends  for  Each  Site  and 
Across  All  Sites,  High  and  Low  Sunlight  Angles 

High  Sunlight  Angle  Low  Sunlight  Angle 


B  H  F  A 
C  D  W  X 
C  H  W  X 
C  D  W  X 
H  D  W  X 
H  F  B  A 
X  D  F  H 
C  D  W  X 
D  H  W  X 
D  H  W  X 


A  H  W  F 
C  D  X  W 
C  D  X  W 
D  A  X  W 
C  D  X  W 
D  H  A  F 
H  F  W  X 
C  D  X  W 
C  D  W  X 
H  D  X  W 


Note  that  Table  1  shows  the  colors  for  each  of  the  alphabetical  letters. 
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For  a  camouflage  color  to  be  effective,  it  must  have  camouflage 
effectiveness  across  a  wide  range  of  sites.  It  is  too  costly  and  time 
consuming  to  paint  equipment  for  specific  areas  unless  the  resources  are  to 
remain  in  that  geographic  location  for  a  considerable  period  of  time. 
Likewise,  only  the  best  four  camouflage  colors  should  be  of  interest  for 
this  study. 

Table  12  shows  that  the  best  four  camouflage  colors  to  blend  with  the 
desert  terrain  when  averaged  across  all  nine  sites  for  high  sunlight  angle 
were  DHWX,  with  X  the  best  color  and  D  the  worst.  The  same  four  colors 
were  also  the  most  effective  for  the  low  sunlight  angle  reading  worst  to 
best  HDXW •  The  only  difference  between  the  two  groups  is  that  the  order  of 
X  and  W  and  H  and  D  are  reversed.  For  both  sunlight  angles,  colors  W  and  X 
were  better  than  colors  D  and  H.  Therefore,  the  remaining  task  is  to 
determine  if  H  and  D  and  W  and  X  differ  significantly  (  ot=0.05)  from  each 
after.  Tables  9-11  and  Figure  3  indicate  the  ability  of  each  of  the  ten 
colors  evaluated  to  blend  with  the  desert  terrain  averaged  across  all  nine 
sites  and  both  high  and  low  sunlight  angles.  Table  11  indicates  that 
although  the  colors  in  color  grouping  A  and  B  are  significantly  different 
(  a=  0.05),  there  were  no  significant  differences  within  the  groups.  Thus, 
it  can  be  concluded  that  the  reversals  of  colors  H  and  D  and  W  and  X  for 
the  high  and  low  sunlight  angles  are  of  minor  consequence.  From  a 
practical  field  evaluation  standpoint,  future  studies  can  be  conducted 
using  only  the  high  sunlight  angle  because  it  represents  the  longest  period 
of  the  day. 

5.0  SECTION  5  -  Summary  and  Conclusions 

A  total  of  ten  CUCV  vehicles  were  painted  in  camouflage  colors  and 
viewed  by  thirteen  ground  observers  at  nine  desert  sites  in  the  United 
States  desert  southwest.  The  colors  were  divided  into  two  groups  of  five. 
The  best  three  colors  from  each  of  the  two  groups  were  selected  on  their 
ability  to  blend  with  the  desert  terrain.  The  resulting  six  colors  were 
then  ranked  on  their  ability  to  blend  using  a  six  point  scale  with  one 
being  the  best  and  six  being  the  worst.  No  tie  values  were  allowed  and  a 
value  of  seven  was  assigned  to  the  colors  that  did  not  make  the  final  six. 
This  data  was  collected  for  both  high  and  low  sunlight  angles  to  determine 
what  effects  the  lighting  conditions  had  in  the  rating  of  the  different 
camouflage  colors  to  blend  with  the  terrain. 

Analysis  of  the  data  indicated  that  desert  colors  W  and  X  were  better 
than  H  and  D  for  both  high  and  low  sunlight  angles.  The  order  of  W  and  X 
and  H  and  D  were  reversed  for  the  two  lighting  conditions.  Additional 
statistical  analysis  revealed  that  within  each  color  grouping  A  and  B, 
there  were  no  significant  differences  (a  =  0.05).  The  order  reversal  of  H 
and  D  and  W  and  X  for  the  two  sunlight  angle  conditions  is  therefore  not 
important.  It  is  concluded  that  future  field  evaluations  should  involve 
only  one  sunlight  angle.  This  will  be  the  high  sun  angle  as  it  represents 
a  longer  period  of  time  for  each  day. 
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Weibull  Tail  Modeling  for  Estimating  Confidence  on  Quantiles  from 

Censored  Samples 


Mark  Vangel 

U.S .  Army  Materials  Technology  Laboratory 
Watertown,  Massachusetts  02172-0001 


This  paper  describes  a  simple  method  for  estimating  lower 
confidence  bounds  on  quantiles  from  a  Weibull  tail  model, 

A  two  step  procedure  is  proposed  for  estimating  the  100g%  lower 
confidence  bound  for  the  pth  quantile  of  a  Weibull  sample  of  size  n. 
Parameter  estimates  are  first  obtained  for  a  Weibull  model  fit  to 
the  lower  tail  values.  The  inverse  of  the  estimated  CDF  is  then 
evaluated  at  the  (l-q)th  quantile  of  the  beta  distribution  with 
parameters  n(l-p)  and  np+1 . 

This  method  is  proposed  as  a  simple  alternative  to  Lawless' 
elaborate  conditional  procedure  specifically  for  determining 
* B-Basis '  values.  The  8-Basis  value  is  defined  to  be  the  quantile 
corresponding  to  the  lower  95%  confidence  bound  on  90%  reliability. 
This  value  is  used  by  the  aircraft  industry  to  determine  the" 
acceptablity  of  composite  materials.  Composite  material  failure 
data  is  often  multimodal,  and  lower  tail  modeling  is  expected  to 
circumvent  this  di f ficulty . 

A  preliminary  Monte  Carlo  study  indicates  that  the  proposed 
method  compares  favorably  with  the  Lawless  procedure  for  obtaining 
B-Basis  values. 
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1.  introductions 

When  assessing  the  strength  of  composite  materials  for  aircraft 
applications,  an  important  criterion  is  the  material  basis  property, 
defined  as  the  95%  lower  confidence  bound  on  the  stress  at  which  the 
material  fails  with  10%  probability. 

To  be  useful  for  this  application,  a  lower  confidence  bound 
{ LCB )  estimator  must  be  able  to  contend  with  the  primary  problems  of 
composite  failure  data  analysis;  that  is,  small  samples  (00)  and 
multiple  failure  modes.  Because  of  this  multimodality,  a  parametric 
model  often  cannot  be  fit  to  an  entire  sample,  and  the  standard 
nonparametric  approach  (e.g.  Conover,  1980),  based  on  the  sample 
order  statistics,  usually  yields  very  conservative  results.  In 
order  to  get  a  useful  estimate  of  the  basis  property  in  this  case, 
recent  work  suggests  modeling  as  much  of  the  tail  as  possible,  and 
considering  the  rest  of  the  sample  as  Type  II  censored  (Breiman, 
Stone,  and  Gins,  1981).  This  paper  develops  a  simple  approximate 
method  based  on  such  a  tail  model  for  estimating  confidence  bounds 
on  Wei bull  quantiles,  which  is  particularly  useful  for  estimating 
material  basis  properties  from  small  samples. 

2.  Review  of  Exact  Methods 

Exact  methods  for  inference  on  the  parameters  of  the  (two 
parameter)  Weibull  distribution 


are  ultimately  based  on  the  pivotal  random  variables  for  the  maximum 
likelihood  estimators  (MLE's).  These  pivotals  are  (Thoman,  Bain, 
and  Antle  ,  1969 ) : 

Z1  -  a/ a 


for  the  shape  parameter  (  e  )  and 


Z2  «  aln(S/8) 
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for  the  scale  parameter  (  $  ).  That  is,  Z,  and  Z 2  have 
distributions  which  depend  only  on  the  sample  size  and  on  the 
censoring  configuration,  not  on  the  population  parameters.  The 
distributions  of  these  pivotals  cannot  be  written  down  in  closed 
form,  but  may  be  easily  estimated  by  Monte  Carlo.  Once  the 
quantiles  of  the  pivotals  have  been  tabulated  for  various  sample 
sizes,  exact  confidence  intervals  for  the  Weibull  MLE's  may  be 
obtained. 

Confidence  on  quantiles  of  the  Weibull  cumulative  distribution 

function  can  be  calculated  from  the  pivotal  for  the  pth  quantile 

X  ,  0<p<l,  which  is  (Thoman,  Bain,  and  Antle,  1971) 

P 

Zp=Z2~ln(-ln{ 1-p) ) Z^ 

Of  course,  the  quantiles  of  this  pivotal  must  once  again  be 
determined  by  Monte  Carlo.  The  tables  published  in  the  original 
paper  are  not  always  accurate.  Corrected  tables  are  available 
(e.g.  Neal  and  Spiridigliozzi ,  1983). 

For  censored  data,  it  is  necessary  to  tabulate  Z  for  censoring 

ir 

situation  as  well  as  sample  size.  Partial  tables  are  available 
(Billman,  Antle,  and  Bain,  1972),  but  any  reasonably  complete 
tabulation  would  be  unweildly. 

Lawless  (1979)  demonstrated  that  although  the  distribution  of 
2^  is  intractable,  the  pivotal  of  the  quantile  conditioned  on  the 
ancillary  statistics  (statistics  whose  distribution  does  not  depend 
on  the  population  parameters)  may  be  found  in  closed  form.  With  the 
aid  of  a  computer,  a  conditional  confidence  interval  for  the 
quantile  can  then  be  obtained  without  resort  to  Monte  Carlo.  This 
conditional  interval  probably  does  not  differ  very  much  from  the 
unconditional  interval  (Lawless,  1973). 

The  Lawless  method  provides  exact  conditional  intervals  for 
confidence  on  the  parameters  and  quantiles  of  any  continuous 
location-scale  family,  as  long  as  the  parameter  estimators  are 
equivariant.  Equivariant  estimators  of  a  location  parameter  u  and  a 
scale  parameter  b  are  functions  of  the  sample  x=(x^< , . . . ,Xn)  such 

that  for  any  Cj  and  any  C2>0* 
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u(clx  +  c2>  =C1U{x)+C2 
b(c1x+c2)=c1b(x) 

In  particular,  MLE's  are  equivariant  estimators.  A  detailed 
development  of  the  conditional  procedure  may  be  found  in  Lawless 
(1982)  . 

Since  the  logarithmn  of  a  random  variable  having  the  extreme 
value  distribution  with  location  u  and  scale  b, 


G(x) 


- ( (x-u) /b) 
e  > 


is  Weibull  with  shape  (  a  )  and  scale  (  0  )  given  by 


«  =  1/b  q  = 


the  Lawless  procedure  applied  to  the  extreme  value  distribution  will 
yield  the  desired  confidence  on  the  Weibull  quantile.  This 
procedure  is  sketched  below  for  Type  II  censoring.  This  outline 
follows  the  exposition  in  Lawless'  book  (1982). 

If  the  Type  II  censored  sample 

X1 rX2  '  *  *  *  'xr  ^5 ^ 

is  independently  identically  distributed  G(x),  and  if  u  and  b  are 
any  equivariant  estimators  of  the  extreme  value  parameters,  then: 

zx  =  (u-u)/b  z2  =  b/b  Z3  =  (u-u)/b 

Zp  =  zi  ~  ln(<-ln(l-p))  /Z^ 

are  all  pivotal  statistics;  with  pivotal  for  the  pth  quantile  of 
G(x).  Also,  the  statistics: 

a  *= 

form  a  complete  set  of  ancillary  statistics  of  which  any  r-2  are 
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functionally  independent. 

Let  the  corresponding  ordered  extreme  value  sample  be 

The  conditional  pdf  of  Z2  given  a  is  of  the  form 


k(a,r ,n)e (z-1)far 

h2  (z  1  S>  = - s---- - ----- 

<(Z  eV)/r)r 


where  K  is  a  constant  given  a,  r,  and  n.  The  constant  is  determined 
by  numerically  integrating  the  density  h2(z  I  a) •  Finally,  the 
conditional  distribution  of  Zp  given  a  is 


where 


P(zp  <  t j a)  = 


/w  +tz  * 

h2(z|a)  I(r,e  p  I  eaiZ)  dz 

0 


it 

I  w. 


r 

Ew  +  (n-r)vj  , 
1  *  ^ 


w  =  ln(-ln(l-p)) 

P 

and  I(r,s)  is  the  incomplete  gamma  function 


1  Sf 

1<r>8)  ■  nn  l  u 


r-1  -u  , 
e  du 


The  Lawless  method  may  be  used  to  calculate  exact  conditional 
confidence  intervals  or  bounds  for  Weibull  quantiles  without  the 
need  for  tables.  The  primary  disadvantage  of  this  procedure  is  its 
complexity.  The  numerical  integration  is  not  trivial,  particularly 
when  r  is  large.  It  is  the  aim  of  this  paper  to  present  a  very 
simple  approximate  method  for  obtaining  intervals  which  are  often 
close  to  the  Lawless  results. 
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3.  An  Approximate  Method  for  Estimating  the  LCB  of  a  Quantile 
Let  Vj! * . . •  ,<yr  be  the  r  smallest  order  statistics  from  a 
continuous  distribution  F(  *  ) .  Let 

xp=F_1(p) 

be  the  pth  quantile  of  F(x),  and  let  Lp  be  an  estimated  1007  %  LCB 

for  Xp«  Assume  initially  that  p  =  j/n  for  some  integer  j  so  that  y. 

estimates  X  .  ^ 

P 

Using  y ^  as  an  estimator  for  Xp,  one  obtains  the  following 
approximation 

Y  =  P(V  V  "  P(F(Ii  1  F(xp))  “  1  ‘  p(F(yj)  >  f(l^ 

But  F(yj)  has  the  beta  distribution 


.  r(j)r(n-j+i)  u 

l-Y  -  B.t.(uy|j,n-J+1)  -  — fWIr-  £  tJ-Aa-tJ-J  dt  . 


The  approximate  LCB  is  then 


L 

P 


F"1^)  . 


If  j/n  -  p  for  integer  j,  let  uy  be  the  100(1-  7  )  percentile  from 
the  Beta  (u;pn, (l-p)n+l )  distribution. 

For  the  Weibull  case 

“  ^1<UY)  =  3  lhd/d-u^))1^ 

where  a  and  0  are  the  MLE's.  This  estimator  is  identical  to 
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Thoman,  Bain,  and  Antle r s  estimator,  except  the  quantile  of  Z^  for 
appropriate  n  and  r  is  replaced  with  a  quantile  from  a  beta 
distribution. 

4 .  Interpretation  of  the  LCB  Estimator  an  as  approximation  to  the 
quantile  pivotal 

Following  Thoman,  Bain,  and  Antle  (1971),  let  the  distribution 
of  2^  be  G(Z)  and 

P(Zp  1  ZY)  =  G(2-y)  =  Y. 

This  implies  that 

(*)  P(Se”Va  <  e<-ln(l-P))1/a)  -  Y 

it  is  because  of  (*)  that  2  is  pivotal  for  X  .  The  new  estimator 
yields  an  approximate  relation  of  the  same  form  as  (*)■. 

P(B(-ln(l-uY))1/a  <  B(-ln(l~p))1/a)  *  y 

For  this  to  be  an  approximation,  of  course,  the  left  hand  sides  of 
the  inequalities  should  be  nearly  equal 

A 

(S(-ln(l-uY))1/a  -  0e-Zy/a 

or,  equivalently, 


ZY  “  ZY  “  -ln(-ln(l-u  ))  . 

For  the  approximation  to  be  useful,  the  random  variable 
should  have  a  distribution  close  to  that  of  the  pivotal  Z^  in  the 
vicinity  of  the  quantiles  of  interest.  Since  Z  is  a  simple 
transformation  of  a  beta  random  variable,  if 

-e"Z 

u(z)  =  1-e 
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therv  the  density  of  z  is 


t/.,r7\\  _  r(n+l)(u-l)  r-1 ,,  ,n- 

f(u(z))  -  „  (1-u) 


To  graphically  illustrate  the  agreement  between  the  pivotal  density 
and  the  density  of  z,  several  simulations  were  performed,  (Figure  1, 
a-d).  The  values  of  p  and  y  were  set  at  .1  and  .95  respectively, 
since  the  95%  lower  confidence  bound  on  10%  probability  of  failure 
is  the  case  of  primary  interest  in  aircra  ft  design.  The  sample 
sizes  were  kept  small  -  reflecting  the  expected  range  of  sample 
sizes  of  composite  failure  data,  n  =  10,  20,  30,  40,  and  50.  For 
each  sample  size,  the  upper  two  thirds  of  the  data  was  Type  II 
censored:  r  =  6,  9,  12,  and  15.  For  Z,  the  exact  density  is 
plotted.  For  the  pivotal,  the  density  is  estimated  using  a  four 
parameter  generalization  of  Tukey's  lambda  distribution  (Ramberg, 
et.al.,  1979)  applied  to  2,500  Monte  Carlo  replicates  for  each  case. 
The  agreement  between  the  densities  appears  to  be  quite  good,  as 
long  as  one  bears  in  mind  that  for  intervals  with  reasonable 
confidence,  one  need  only  be  concerned  with  the  validity  of  the 
approximation  in  the  tails. 


5 .  Comparison  with  the  Lawless  Method 

A  simulation  was  performed  to  directly  compare  the  Lawless 
procedure  with  the  approximation  presented  in  this  paper.  Because 
of  the  computational  effort  required  for  the  Lawless  integration, 
the  scope  of  this  study  was  necessarily  modest.  However,  useful 
results  were  obtained  despite  the  restriction  to  10  replicates  per 
case.  It  was  decided  to  fix  p  =  .1  and  y  -  .95  as  in  the  previous 
section.  Also,  the  sample  size  was  fixed  at  30,  since  this  is 
typical  for  composite  material  failure  data  in  aircraft  industry 
testing.  Lower  confidence  bound  estimates  were  obtained  for 
pseudo-random  Weibull  samples  with  shape  parameters  in  the  range 
2  to  100  and  Type  II  censoring  of  90%  to  01  (r  =  3,  6,  9,. ..,30). 
The  average  percent  differences  in  the  results  are  presented  in 
Figure  2a.  Note  that  for  r  *  9 ,  there  is  amazing  agreement  between 


the  two  methods.  This  could  have  been  anticipated  from  the  close 
agreement  at  the  95th  percentiles  of  Z  and  Z  for  this  case  (Figure 
Id) . 

For  the  approximate  estimator,  the  dependence  of  the  simulation 
results  on  the  Weibull  shape  parameter  may  be  completely  removed  by 
transforming  the  estimator  to  its  pivotal s 

aln(L  /p) 

P 

This  transformation  was  applied  to  both  the  Lawless  results  and  the 
approximation  results.  The  percent  difference  between  the  methods 
for  the  transformed  data  showed  no  dependence  on  ,  so  these  were 
averaged  over  all  the  data  providing  a  measure  of  percent  difference 
vs.  r  based  on  150  replicates  per  r  value.  (Figure  2b).  Positive 
percent  difference  is  defined  here  to  mean  that  the  Lawless  bound 
was  greater  than  the  approximate  bound.  For  9<r£30,  the 
approximation  yields  a  conservative  result.  It  is  reassuring  that 
potentially  dangerous  nonconservative  estimates  only  occur  for  very 
small  values  of  r. 

6 .  Examples 

As  examples,  the  approximate  method  was  applied  to  three 
extreme  value  data  sets  from  the  literature  (Figure  3  and  4).  In 
all  of  these  cases,  either  the  approximation  gives  a  result  very 
close  to  that  obtained  via  the  conditional  procedure,  or  the 
approximation  provides  a  result  which  is  more  conservative. 

These  examples,  of  course,  cannot  by  themselves  validate  the 
proposed  method.  They  are  intended  rather  to  highlight  the  ease 
with  which  one  may  arrive  at  reasonable  results,  making  use  of  a 
computer  only  to  obtain  MLE's  of  the  parameters  and,  possibly,  the 
quantiles  of  the  relevant  beta  distribution. 
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7.  Conclusion 


The  proposed  method  is  attractive  as  an  alternative  to  the 
Lawless  procedure.  The  Lawless  method  is  computationally  complex, 
whereas  the  new  method  is  very  easy  to  apply.  Unfortunately,  while 
the  Lawless  method  may  be  justified  theoretically,  the  proposed 
method  as  yet  has  no  firm  theoretical  basis.  The  interpretation  of 
the  new  method  as  an  approximation  to  the  pivotal  is  interesting, 
but  by  itself  it  cannot  provide  this  foundation.  The  natural 
question  of  how  good  this  approximation  is  in  general  cannot  be 
answered  because  the  pivotal  distibution  can  only  be  obtained  by 
simulation.  For  the  cases  considered,  namely  95%  LCB  on  10%  point 
from  samples  of  10  through  50,  however,  the  approximation  is  good. 
Also,  the  method  has  been  demonstrated  to  give  results  for  a  sample 
size  of  30,  which  are  generally  either  close  to  or  more  conservative 
than  the  Lawless  results.  To  validate  the  procedure,  either  an 
extensive  Monte  Carlo  study  or  a  deeper  theoretical  investigation 
must  be  performed.  Both  of  these  approaches  will  be  considered  in 
the  near  future. 
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FIGURE  2 


MONTE  CARLO  COMPARISON  OF 
APPROXIMATE  METHOD  WITH  LAWLESS  METHOD 


FIGURE  3 


EXAMPLES  LAWLESS  (1982)  ,  p.156 

Type  II  censored  extreme  value  sample 

n  =  20  r  =  10 

Estimate  90%  confidence  interval  for  X  ^ 

u  =  -.122  b  =  .907  tt  =  .931  /  b  =  1.026* 

A. 

8  =  eu  =  .8852 

h  .< 

/seta  (t?l,19)  dt  =  /Beta  ( t ? 1 , 19 )  dt  =  .05 

©  *4, 

A  lift1 

In  [-  $  lnd-Sj)  =  -4.03,  Lawless  =  -3.74 

A  tfr 

In  [- 8  ln(l-s2) »  -1.50,  Lawless  =  -1.49 

*  Unbiased  MLE  (Thoman,  Bain,  and  Antle,  1969) 
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FIGURE  4 


EXAMPLE:  LAWLESS  (1975) ,  p.  255 


n  =  40  r  =  28  -2.982,  -2.849,.  .  .,  .245,  .296 


Pseudo  random  sample  from  extreme  value  distribution  with  u  -  0 ,  b  -  1 


u  =  .1563  b  =  .9104 


A  <N 

a  =  .966  /  b  =  1.061 

@  =  eu  =  1.169 


Lower  95% 

con  fidence 

X.l 

X.  05 

Lawless 

-2.71 

-3.61 

Approximation 

-2.99 

-3.62 
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ABSTRACT 

The  Lindstrom-Madden  method  of  computing  lower  confidence 
limits  for  series  systems  with  unlike  components  is  extended  to 
series  systems  with  repeated  components  utilizing  the  results  of 
Harris  and  Soms  (1983).  An  exact  solution  is  given  for  no 
failures  and  key  test  results,  together  with  an  approximation  for 
the  general  case.  Numerical  examples  are  also  provided. 


1.  INTRODUCTION  AND  SUMMARY 

A  problem  of  substantial  importance  to  practitioners  in 
reliability  is  the  statistical  estimation  of  the  reliability  of  a 
series  system  of  stochastically  independent  components  when  some 
components  are  repeated,  using  experimental  data  collected  on  the 
individual  components.  In  the  situations  discussed  in  this  paper, 
the  component  data  consist  of  a  sequence  of  Bernoulli  trials. 

Thus,  for  component  i,  i  =  1,2,...,k,  the  data  is  the  pair 
{ni,Yi),  where  n^  is  the  number  of  trials  and  Y^  is  the 
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number  of  observations  for  which  the  component  functions. 

Y-j  ,Y2 , . . . ,  are  assumed  to  be  mutually  independent  random 

variables.  We  assume  that  there  are  y^  components  of  type  i, 

1  <  i  <  k.  Then  the  parameter  of  interest  is 

1*0 

h(p^ ,p2/ • « • /P^)  =  h(p) ,  the  reliability  of  the  system,  where 

k  y 

Mp)  =  TT  P,  • 
i=1 

More  specifically,  it  is  desired  to  obtain  a  Buehler  (1957) 
optimal  lower  1  -  a  confidence  limit  on  h(p) . 

The  case  of  Y  =  Y  =  ...  =  Y  =1  has  been  treated  in 

I  Z.  K 

Sudakov  (1974),  Winterbottom  (1974),  and  Harris  and  Soms  (1983). 

In  Section  2  we  summarize  the  general  theory  of  Harris  and 
Soms  (1983)  applicable  here.  In  Section  3  the  exact  solutions  to 
no  failures  and  key  test  results  are  given.  Lindstrom-Madden  type 
approximations  are  given  in  Section  4.  Section  5  contains 
numerical  examples. 

2.  BUEHLER' S  METHOD  FOR  OPTIMAL  CONFIDENCE  LIMITS 

We  now  specialize  the  general  results  of  Harris  and  Soms 
(1983)  on  optimal  confidence  limits  for  system  reliability  to  a 
series  system  with  independent  and  repeated  components.  As  in 
Section  1,  let 

k  Y, 

~  -i — r  i 

h(p)  =  TT  Pi  / 
i=1 

0  <  Pi  <  1,  =  n^  -  Y^,  x^  =  n^  -  y^,  1  <  i  <  k, 

S  =  {x|x^  =  0,1,. . . ,n±/  1  <  i  <  k}  and  let  g(x)  =  (x^ ,x?, . . . ,xk) 

rsj 

be  an  ordering  function,  i.e.,  for  real  x^,  0  <  x^  <  n^,  g(x)  is 
non-decreasing  in  each  component.  It  is  often  convenient  to 

/v  /V 

normalize  g(x)  by  letting  g(0)  =  1  and  g(n)  =  0.  With  such  a 

normalization,  g(x)  is  often  selected  to  be  a  point  estimator  of 

h(p).  Also  let  R  =  {r  ,r  , ...,r  ,  s  >  2}  be  the  range  set  of 

I  z  s 

r\j 

g(x).  with  no  loss  of  generality  we  order  R  so  that 
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r1  >  r2  >  >  rg  and  let  A^  =  {x|g(x)  =  r^,  x  e  S, 

i  =  1,2,...,s}.  The  sets  A^  constitute  a  partition  of  S 
induced  by  g(x).  we  assume  throughout  that  the  data  is 
distributed  by 


f (x?p) 


-x)  -  TT  („l)p 


i=1 


ni-xi  Xi 


yi  ni“yi 
pi  qi 


(2.1) 


where  qi  =  1  -  p^,  i  =  1,2,...,k.  With  no  loss  of  generality,  we 
assume  <  ...  <  n^. 


From  these  definitions,  it  follows  that 


P~{x  €  U  A  }  =  P~{g(X)  >  r.}  .  (2.2) 

d  .  i  P  i 

f  i=1  L 

From  (2.1)  and  (2.2),  we  have 

Ul  u2  Uk 

P-{g(X)  >  r  }  =  l  l  •••  l  f(ijp)  ,  (2.3) 

p  3  '  i  =0  i  =0  i,  =0 

1  2  k 

wliorG  i  =  ( i  1 1  /  ♦  *  • ;  ^ )  snd  u.2  -  ^2  *  •*  /  — 

u^d^, i2/ •  •  • /i^i)  are  integers  determined  by  r j  .  Equivalently, 

'V  'b1  !V 

P-{g(X)  >  r.}  =  I  l  I  f(ifp)  /  (2.4) 

p  3  h-0  V°  v° 

where  1 2  ~  t£  ^  ^  V t '  *  *  ♦  / 1^  “  t^  1  ^  ^  with 

t1  =  sup{ 1 1  0  <  t  <  n1  and  g(t,0,0, . . . ,0)  >  r_.}  and 

t£(i1,i2,***,h-1)  =  ^p^l0  <  t  <  and 

g(^i  1^2'  *  *  *  'h~i ,t,0/ •  *• ,0  >  *  r  ^  =  2,3, ...,k. 

We  now  introduce  the  notion  of  Buehler  optimal  confidence 
limits.  Let  g(x)  =  r...  Then  define 


ag(x)  =  |P^{i|g(i)  >  g(x)>  >  «}  .  (2.5) 


Equivalently,  by  (2.2),  we  can  also  write 
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(2.6) 


ag(x)  =  inf{h(P)  l^lx  «  u  \}  >  al  * 


i=1 


'Then  we  have,  from  Harris  and  Soms  (1983), 

Theorem  2.1.  a  ,~v  is  a  1  -  a  lower  confidence  limit  for 
- _ - -  g(x) 

h(p).  If  b  ~  is  any  other  1-a  lower  confidence  limit  for 
~  5(xi 

h(p)  with  b  >  b  >  ...  >  b  ,  then  b  ~  <  a  ~  for  all 

^  r1  r2  rj  g(x)  g(x) 

X  £  S. 


Two  possible  choices  of  g(x)  are 

*  Y  j 

( (ni  -  x_j  )/n^ : 
i=1 


^  k  y 

g(x)  =  Tf  ( -  xi)/ni)  , 


(2.7) 


or 


Y.-1 

i  n  -  x  -  2 


<r <;>  -  n  rr  c 1  „  ^  ) . 

i=1  j=0  i  J 


(2.8) 


Both  reduce  to  the  generally  used  g(x)  for  series  systems  with 


independent  components  when  y ^  =  ^2  = 


=  Yk  =  1/  i.e.. 


g(x)  =  ]  [  (n.^  -  xi)/ni  . 

i=1 


Since  (2.7)  is  the  maximum  likelihood  estimator  of  h(p)  we  will 
use  it  here  and  from  now  on  it  will  be  understood  that  g(x)  is 

(V 

given  by  (2.7).  With  this  choice  of  g(x),  we  assume  from  now  on 


that  0  <  x,  <  n.  ,  i  =  1,2,. ..,k,  since  a 


=0  if  some 


i  —  “g(x) 

xi  =  ni*  this  assumption,  the  t^  in  (2.4)  are  given  by 


k  y  /  k  y  1/y. 

•1  =  n1  -  (TT  <n±  “  *±>  /  FT  n±  ) 

i=1  /  i=2 


(2.9) 


and 
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H  ~  ”1 


(TJ  (nj.  -  V  y 

1-1  Y  k  y,  ^/yn 

n  (nB  -  V  S  IT  n,1)  * 

s=1  S  S  i=A+1  1 


(2.10) 


k  Y 

1  =  2,...,k,  with  ]  |”  n,  =  1. 

i=k+1 

For  the  purpose  of  simplifying  the  calculation  of  ag(x) 
special  cases  it  is  necessary  to  state  additional  results  from 
Harris  and  Soms  (1983). 

/v 

Theorem  2.2.  Let  g(x)  =  r.  and  let 


f  (xja)  =  sup  P~{g(X)  >  r.},  0  <  a  <  1 

h(p)=a  P  3 


(2.11) 


•k~  *  /v 

inf  f  (x;a)  =  0,  sup  f  (xja)  =  1 
0<a<1  0<a< 1 

it  ^ 

and  f  (xja)  is  strictly  increasing  in  a. 

♦  AJ 

Theorem  2.3.  f  (xja)  =  a  has  exactly  one  solution  a^  in  a 

and  a„  =  a  >  . 
a  g(x) 


3.  EXACT  SOLUTIONS  FOR  ZERO  FAILURES  AND  KEY  TEST  RESULTS 

We  first  assume  that  x  =  (0,0,.  .  .,0)  =  0  and  use  Theorem 

2.3  to  obtain  a  ~  . 

g(0) 

ns 

Theorem  3.1.  If  x  =  0 ,  then 


k  n  n . /y  . 

f  (Oja)  =  sup  "1  f  p^1  =  a  3  3  , 

k  Y,  i=1 

IT  P^-a 

i=1 


(3.1) 


where  n .  /Y  .  =  min  n ,  /y  ,  and 
3  3  1<i<k 


Y./a. 

ag(0)  “  a 


(3.2) 
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Proof 


--  n,  k  Y,  n./y.  k  (n  Y  ,**n ,y.  )/y  , 

TT  Pi1  -  on  P,1)  3  3  TT  p,  3  31  j 

i-1  i-1  1  i-1  1 


n./Y  . 

i  a  3  3  , 


Then,  using  Theorem  2.3,  we  obtain  (3.2), 


since  n  y  -  n  Y  >  0  is  equivalent  to  n ,/y  >  n./y.,  which  is 

A  (VrniYi)/Yi  11  3 

true,  and  therefore  JT  P-t  3  3  3  <  1.  (3.1)  follows  by 

i=1 

i/j 

1/Y  . 

noting  that  the  choice  p.  =  a  3 ,  p,  -  1,  i  /  j,  gives 
h  n,  n./y.  J 

TTp/-*3  3. 

i=  1 

which  reduces  to  the  known  series  result  if 
Y1  =  Y2  =  '  **  =  Yk  =  1# 

We  now  turn  to  analogues  of  key  test  results  (see,  e.g., 
Winterbottom  (1974)  and  Harris  and  Soms  (1983)).  We  define  a  key 
test  result  if  Y .  =  max  y  (recall  that  n.  =  min  n,)  and 
x  =  (x1#0,..w0).  1<i<k  1<i<k 

Theorem  3.2.  If  x  is  a  key  test  result  and 

{-ITT  <ni  "  zi}  1  *  TT  (n±  -  X  )  i}  =  fz|  l  (ni-z1) 

i=1  1  i-1  1  1-1 


k 

>  I  (n. -x  )}  ,  (3.3) 

i=1  . 


then 

*  ~ 

f  (x;a)  =  I  (n  -  x1,x1  +  1)  ,  (3.4) 

a 

where  IJ{(a,b)  is  the  Incomplete  beta  function.  Let  b^  denote 
the  solution  in  b  of 

a  =  Ib(n1  -  x1,x1  +  1)  . 
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Then  a  ~  =  b  .  Note  that  b  is  the  usual  1  -  a  lower 

g(x)  a  a 

confidence  limit  on  p,  given  failures  in  trials. 

Proof .  Without  loss  of  generality  we  can  assume  that 
n1  ~  n2  =  =  nlc'  ^or  ot^erwise  we  can  write  (2.4)  as 


( xri -j  wi 2 ’  •  •  ~i-)<_ <j )  / ^  —i  -j -i2~ *  *  * 1  ^ ^ 


Xriri2' 


k-2 


V 


1  =0 
k-1 


“k- 1 


^  nk-1~1k-1  Sc-I 
'Pk-1  qk-1 


(n. 


( x^-i -j -i2- •  •  • — ^"2*  *  ‘""^"k- 1  +  ^  '  (3*5) 

/V 

where  g(x)  =  rj/  by  the  monotone  likelihood  ratio  property  of 
the  beta  distribution  (Ix(a,b)  has  a  monotone  likelihood  ratio 
in  -a  for  fixed  b,  which  implies  that  Ix(a,b)  is  a 
decreasing  function  of  a).  A  similar  argument  applies  to  the 
other  indexes.  Thus,  if  (3.4)  is  true  for  n^  =  nj  “  ...  =  n^, 
by  (3.5)  it  follows  for  n^  <  <  ...  <  n^. 

So,  assuming  n  =  {n^ ,n^ , . , . ,n^ ) ,  we  seek  to  maximize 

k  n  1  k  k 

H  I  1  Y  >  l  (nt  -  xL)  -  l  y.}  ,  (3.6) 

P  i=1  j=1  iD  i=1  i=1  1 

where  are  independent  Bernoulli  random  variables  with 

k  Yi  k  Y1  k 

parameter  p.  and  ]  [*  p  =  a.  If  ]  f"  p  =  a,  then  "[  |"  p., 

i=1  1  i=1  1  i-1 
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1/Y.  1/Y 1 

ranges  from  a  to  a  y  Y  .  =  min  y^.  This  is  seen  as 


follows  : 


1<i<k 


k  k  y  1/Y.  k 

n  -  (TT  p/)  1  FT  Pj. 

±=1  i=1  i=2 


’“VT 


and 


1/Y.  k  <Y.~Y. )/Y.  1/Y. 

1  “I - T  111.  1 

=  a  i  |  Py  <  a 

i=2 


k  k  y  1/Y.  k  1-Y./Y. 

FT  (FT  p/)  3  TT  pt  1  3 

i=1  i=1  i=1 

1/Y.  k  (Y.-Y,)/Y.  1/Y. 

-  ,  3  TT  p,  3  1  3  >  •  3 

i=1 


1/Y. 


and  the  choices 


Pi 


=  a 


P2 


1/Y  . 

=  pk  =  1,  and  Pj  =  a  ‘’y 


p^  =  1,  i  /  jy  attain  these  values.  From  the  results  of  Pledger 


and  Proschan  (1971),  for  each  b  =  ~T"T  Pi'  a 

i=1 


1/Y. 


1/Y. 


<  b  <  a 


(3.6)  is  maximized  by  *  b,  =  1,  2  <  i  <  k.  Further,  the 

1/Y .  1/Y . 

1  ] 

maximum  over  b,a  <  b  <  a  ,of  the  maxima  for  each  b  is 
1/Y  1 

given  by  p  ^  =  a  ,  p^=1,  2<i<k,  by  the  monotone 

likelihood  ratio  property  of  the  binomial  distribution,  and 

VYi  k  y± 

=  a  ,p.  =  1,  2  <  i  <  k,  satisfies  | j  p,  =  a.  This 


p1  =  a  'Pi 
completes  the  proof 


i=1 


If  Y^-Y2=***~Yk=1/  some  guidelines  for  the 
verification  of  (3.3)  are  given  in  Harris  and  Soms  (1983).  In  the 
present  case  (3.3)  must  be  verified  by  trial  and  error  by  showing 
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that 


and  that 


*  Yi  Y1  A  Yi 

min  I 1  (ni  -  x  )  =  (n1  -  x  )  |  f  n 

k  i=1  i=2 

l  x  =x 
i=1 


max 


l  x,=x..+1 
i=1 


A  Yi  k  yi 

TT  <ni  “  XH>  <  (n1  -  X  )  XT  n4  • 
1=1  ■  - 


Y-  A  Y< 

it 

i=2 


Example  3 . 1 .  Let  k  -  3,  n=  (5,5,5),  Y  =  (3,3,2),  a  =  .10  and 

3  Yi 

x  =  (1,0,0).  Then  min  ]  f  (n^  -  x  )  =  200000  and 

3  i=1 

f  x  =1 
i=1 

A'  Yi 

max  1 I  (n^  -  x, )  =  140625  and  x  is  a  key  test  result 

3  1=1 

I  xl=2 
i=1 

and  (3.3)  is  satisfied  and  hence 

a  ~  =  .4 16 13  =  .0720  , 

g(x) 

where  .10  =  I  4^g^(4,2).  Further,  it  can  also  be  verified  that 
x  =  (2,0,0)  is  a  key  test  result  for  which  (3.3)  is  satisfied, 
but  that  for  x  =  (3,0,0),  (3.3)  is  violated. 

Y1 

Note  that  Theorem  3.2  asserts  that  a  ~  -  b  for 

g(x)  a 

0  <  a  <  1.  It  is  thus  possible  that  (3.3)  is  not  true  but  the 
conclusion  still  holds  for  a  of  practical  importance.  This  is 
taken  up  in  Section  4. 


4.  THE  LI NDSTROM-M ADDEN  METHOD  FOR  SERIES  SYSTEMS  WITH 
REPEATED  COMPONENTS 


When  Y ^  -  Y2  -  • . *  =  Yr  =  1/  the  Lindstrom-Madden  method 

(henceforth  abbreviated  L-M)  is  an  approximation  to 

a  of  the  form 

g(x) 


b  ~  =  min  b  (n, )  , 

g(x)  1< i<k  a  1 


(4.1) 
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where 


a 


‘vv<»i  - 


/tnj  +  1) 


'Oi'  ^Oi 


(4.2) 


with 

k 

t0i  =  n^d  -  1  f  (^  -  xi)/ni)  ,  (4.3) 

i=1 


i.e.,  tg^  is  the  maximum  of  the  recursive  indexes  t^  defined 

by  (2.4).  For  the  usual  levels  of  a,  b  ~  =  b  (n,).  Further, 

J  g(x)  «  1 

numerical  evidence  indicates  (Harris  and  Soms  (1983))  that  for  a 
levels  of  practical  significance 


v>  ~  <  a  ~ 

g(x>  g(x) 


(4.4) 


(4.4)  was  incorrectly  claimed  to  be  true  for  0  <  a  <  1  in 
Sudakov  (1974)  and  this  is  discussed  at  length  in  Harris  and  Soms 
(1983).  However,  (4.4)  is  known  to  hold  for  special  cases 
( Winterbottom  (.1974)  and  Harris  and  Soms  (1983)). 


Motivated  by  the  above,  we  now  give  an  L-M  approximation 

b  to  a  for  arbitrary  Y .  by 

g(x)  g(x)  J  ’  i 


where 


b 


g(x) 


min  b  ( n  , ) 
oc  i 

1<i<k 


(4.5) 


Tb  (n  )(ni 
a  i 


Hi^Oi 


+  D 


(4.6) 


with 


'Oi 


=  n. 


-  (TT  (n. 

3-1 


j-i 

3^i 


Y.  1/Y. 

n3h  1 


(4.7) 


i.e.,  tg^  is  the  maximum  of  the  recursive  indexes  t^  defined 
by  (2.4).  However,  in  this  case  it  is  not  clear  which  index  i 
gives  the  minimum,  except  that  the  likely  candidate  is  the  one  for 
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which  Y  j , 
that  for  a 


1  <  j  <  k,  is  a  maximum.  We  might  expect, 

levels  of  practical  interest 

b  ~  <  a  ~  . 

g(x)  g(x) 


by  analogy, 

(4.8) 


5.  NUMERICAL  EXAMPLES 


(V 


For  k  =  2  and  selected  n,  Y,  x,  a  =  .05  and  .10,  Table 
I  gives  b 


g(x) ' 


a  and  the  best  upper  bound,  u  ,~w 

g(  x)  gv  x , 


u  ~  =  min  u  (n  ) 

gU)  1<  i<k  °  1 


(5.1) 


where 


and 


“  =  Ju  (n  )(ni  "  [t0i>'[t0i*  +  ^ 
a  i 

(5 

are  defined  as  in 

(4.6) . 

TABLE  I. 

L-M  Approximations  and 

a 

g(x) 

(n1 ,n2) 

<VL> 

(x1 ,x2) 

a 

"h  ^ 
g(x) 

a 

g(x) 

\1 

g(x) 

(10,10) 

(1,2) 

(0,1) 

.05 

.367  0 

.3670 

.3670 

(10,10) 

(1,2) 

(0,1) 

.10 

.4398 

.4398 

.4398 

( 10,10) 

(1,2) 

(1,1) 

.05 

.3045 

.3514 

.3670 

(10,10) 

(1,2) 

(1,1) 

.10 

.3715 

.4227 

.4398 

(10,10) 

(1,2) 

(2,1) 

.05 

.2484 

.3151 

.3670 

(10,10) 

(1,2) 

(2,1) 

.10 

.3088 

.3825 

.4398 

(10,15) 

(2,3) 

(0,1) 

.05 

.3695 

.3719 

.3742 

(10,15) 

(2,3) 

(0,1) 

.10 

.4425 

.4446 

.4467 

(10,15) 

(2,3) 

(1,1) 

.05 

.2554 

.3042 

.3670 

(10,15) 

(2,3) 

(1,1) 

.10 

.3167 

.3705 

.4398 

(10,15) 

(2,3) 

(2,1) 

.05 

.1712 

.1981 

.2431 

(10,15) 

(2,3) 

(2,0) 

.10 

.2203 

.2513 

.3029 
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Note  that  for  all-  the  cases  to  Table  I,  h  In  a  Lower 

q(  x ) 

bound  for  a  The  computations  were  done  by  a  short  FORTRAN 

g(x) 

program,  a  Listing  of  which  can  be  obtained  from  the  author. 


6,  CONCLUDING  REMARKS 

In  this  paper  we  have  extended  the  L-M  method  to  series 
systems  with  repeated  components.  More  work  is  needed  to 
ascertain  the  region  of  validity  of  (4.8). 
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HOWARD  WAIN  HR* 


How  to  Display  Data  Badly 


Methods  for  displaying  data  badly  have  been  devel¬ 
oping  for  many  years,  and  a  wide  variety  of  interesting 
and  inventive  schemes  have  emerged.  Presented  here  is 
a  synthesis  yielding  the  12  most  powerful  techniques 
that  seem  to  underlie  many  of  the  realizations  found  in 
practice.  These  12  (the  dirty  dozen)  are  identified  and 
illustrated. 

KEY  WORDS:  Graphics;  Data  display;  Data  density; 
Data-ink  ratio. 


categorized.  This  article  is  the  beginning  of  such  a 
compendium. 

The  aim  of  good  data  graphics  is  to  display  data  accu¬ 
rately  and  dearly.  Let  us  use  this  definition  as  a  starting 
point  for  categorizing  methods  of  bad  data  display.  The 
definition  has  three  parts.  These  are  (a)  showing  data, 
(b)  showing  data  accurately,  and  (c)  showing  data 
dearly.  Thus,  if  we  wish  to  display  data  badly,  we  have 
three  avenues  to  follow.  Let  us  examine  them  in  se¬ 
quence,  parse  them  into  some  of  their  component  parts, 
and  see  if  we  can  identify  means  for  measuring  the 
success  of  each  strategy. 


1.  INTRODUCTION 


2.  SHOWING  DATA 


The  display  of  data  is  a  topic  of  substantial  contem- 
poraty  interest  and  one  that  has  occupied  the  thoughts 
of  many  scholars  for  almost  200  years.  During  this  time 
there  have  been  a  number  of  attempts  to  codify  stan¬ 
dards  of  good  practice  (c.g.,  ASME  Standards  1915; 
Cox  1978;  Ehrenbeig  1977)  as  well  as  a  number  of 
books  that  have  illustrated  them  (i.e.,  Bertin 
197.1,1977,1981;  Schmid  1954;  Schmid  and  Schmid 
1979;  Tufte  1983).  The  last  decade  or  so  has  seen  a 
tremendous  increase  in  (he  development  of  new  display 
techniques  and  tools  that  have  been  reviewed  recently 
(MncdonaJd-Ross  1977;  Ficnberg  1979;  Cox  1978; 
W  ainer  and  Thissen  1981).  We  wish  to  concentrate  on 
methods  of  data  display  that  leave  the  viewers  as  unin¬ 
formed  as  they  were  before  seeing  the  display  or,  worse, 
those  that  induce  confusion.  Although  suclt  techniques 
arc  broadly  practiced,  to  my  knowledge  they  have  not 
as  yet  been  gathered  into  a  single  source  or  carefully 
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David  Andrews,  Paul  Holland,  Bruce  Kaplan,  James  O.  Ramsay, 
Edward  Tufjc,  the  participants  in  the  Stanford  Workshop  on  Ad¬ 
vanced  Graphical  Presentation,  two  anonymous  referees,  the  long- 
suffering  associate  editor,  and  Gary  Koch. 


Obviously,  if  the  aim  of  a  good  display  is  to  convey 
information,  the  less  information  carried  in  the  display, 
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Figure  1.  An  example  ol  a  low  density  graph  (from  SI3  )ddi  -  .31). 
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Figure  2.  A  low  density  graph  (from  Friedman  and  Rafsky  1981 
l ddi  =  .51). 


the  worse  it  is.  Tufte  (1983)  has  devised  a  scheme  for 
measuring  the  amount  of  information  in  displays,  called 
the  data  density  index  (ddi),  which  is  “the  number  of 
numbers  plotted  per  square  inch.”  This  easily  calcu¬ 
lated  index  is  often  surprisingly  informative.  In  popular 
and  technical  media  we  have  found  a  range  from  .1  to 
362.  This  provides  us  with  the  first  rule  of  bad  data 
display. 

Rule  1 — Show  as  Few  Data  as  Possible  (Minimize  the 
Data  Density) 

What  does  a  data  graphic  with  a  ddi  of  .3  look  like? 
Shown  in  Figure  1  is  a  graphic  from  the  book  Social 
Indicators  III  (SI3),  originally  done  in  four  colors  (orig¬ 
inal  size  7"  by  9")  that  contains  18  numbers  (18/63  =  .3). 
The  median  data  graph  in  S13  has  a  data  density  of  .6 
numbers/in2;  this  one  is  not  an  unusual  choice.  Shown  in 
Figure  2  is  a  plot  from  the  article  by  Friedman  and 
Rafsky  (1981)  with  a  ddi  of  .5  (it  shows  4  numbers  in  8 


fcUnw!«d  pc*  man-fcwr  to  Jtjponcw  menufixJVTtog  ih  a  percentage  U.S.  evtput 


Figure  3.  A  low  density  graph  (©  1978,  The  Washington  Post)  with 
chart-junk  to  till  In  the  space  (ddi  =  .2). 


Public  and  Private  Elementary  Schools  U  Public 

Selected  Years:  1929-1970  . 


Figure  4.  Hiding  the  data  in  the  scale  (from  SI3). 


in2).  This  is  unusual  for  JASA,  where  the  median  data 
graph  has  a  ddi  of  27.  In  defense  of  the  producers  of  this 
plot,  the  point  of  the  graph  is  to'  show  that  a  method  of 
analysis  suggested  by  a  critic  of  their  paper  was  not 
fruitful.  I  suspect  that  prose  would  have  worked  pretty 
well  also. 

Although  arguments  can  be  made  that  high  data  den¬ 
sity  does  not  imply  that  a  graphic  will  be  good,  nor  one 
with  low  density  bad,  it  does  reflect  on  the  efficiency  of 
the  transmission  of  information.  Obviously,  if  we  hold 
clarity  and  accuracy  constant,  more  information  is  bel- 


THE  NUMBER  OF  PRIVATE  ELEMENTARY  SCHOOLS 


Figure  5.  Expending  the  scale  and  showing  the  data  in  Figure  4 
(from  SI3). 
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Figure  6.  Ignoring  the  visual  metaphor  (©  1978,  The  New  York 
Times). 


ter  than  less.  One  of  the  great  assets  of  graphical  tech¬ 
niques  is  that  they  can  convey  large  amounts  of  informa¬ 
tion  in  a  small  space. 

We  note  that  when  a  graph  contains  little  or  no  infor¬ 
mation  the  plot  can  look  quite  empty  (Figure  2)  and 
thus  raise  suspicions  in  the  viewer  that  there  is  nothing 
to  be  communicated.  A  way  to  avoid  these  suspicions  is 
lo  fill  up  the  plot  with  nondata  figurations — whatTufte 
has  termed  “chartjunk.”  Figure  3  shows  a  plot  of  the 
labor  productivity  of  Japan  relative  to  that  of  the 
United  States.  It  contains  one  number  for  each  of  three 
years.  Obviously,  a  graph  of  such  sparse  information 
would  have  a  lot  of  blank  space,  so  filling  the  space 
hides  the  paucity  of  information  from  the  reader. 

A  convenient  measure  of  the  extent  to  which  this 
practice  is  in  use  is  Tufte's  “dala-ink  ratio.”  This  mea¬ 
sure  is  the  ratio  of  the  amount  of  ink  used  In  graphing 
the  data  to  the  total  amount  of  ink  in  the  graph.  The 
closer  to  zero  this  ratio  gets,  the  worse  the  graph.  The 
notion  of  the  data-ink  ratio  brings  us  to  the  second 
principle  of  bad  data  display. 

Rule  2 — Hide  What  Data  You  Do  Show 
( Minimize  the  Data-ink  Ratio) 

One  can  hide  data  in  a  variety  of  ways.  One  method 
that  occurs  with  some  regularity  is  hiding  the  data  in  the 
grid.  The  grid  is  useful  for  plotting  the  points,  but  only 
rarely  afterwards.  Thus  to  display  data  badly,  use  a  fine 
grid  and  plot  the  points  dimly  (see  Tufte  1983, 
pp.  94-93  for  one  repeated  version  of  this). 

A  second  way  to  hide  the  data  is  in  the  scale.  This 
corresponds  to  blowing  up  the  scale  (i.e.,  looking  at  the 
data  from  far  away)  so  that  any  variation  in  the  data  is 
obscured  bv  the  magnitude  of  the  scale.  One  can  justify 
this  practice  by  appealing  to  “honesty  requires  that  we 
start  the  scale  at  zero,"  or  other  sorts  of  sophistry. 

In  Figure  4  is  a  plot  that  (from  S13)  effectively  hides 
the  growth  of  private  schools  in  the  scale.  A  redrawing 
of  the  number  of  private  schools  on  a  different  scale 
conveys  the  growth  that  took  place  during  the  mid- 
1950’s  (Figure  5).  The  relationship  between  this  rise  and 
Brown  vs.  Topeka  School  Board  becomes  an  immediate 
question, 

To  conclude  this  section,  we  have  seen  that  we  can 
display  data  badly  either  by  not  including  them  (Rule  1) 


Figure  7,  Reversing  the  metaphor  in  mid-graph  while  changing 
scales  on  both  axes  (©  June  14,  1981,  The  New  York  Times). 


or  by  hiding  them  (Rule  2).  We  can  measure  the  extent 
to  which  we  are  successful  in  excluding  the  data  through 
the  data  density;  we  can  sometimes  convince  viewers 
that  we  have  included  the  data  through  the  incorpo¬ 
ration  of  chartjunk.  Hiding  the  data  can  be  done  either 
by  using  an  overabundance  of  chartjunk  or  by  cleverly 
choosing  the  scale  so  that  the  data  disappear.  A  mea¬ 
sure  of  the  success  we  have  achieved  in  hiding  the  data 
is  through  the  data-ink  ratio. 

3.  SHOWING  DATA  ACCURATELY 

The  essence  of  a  graphic  display  is  that  a  set  of  num¬ 
bers  having  both  magnitudes  and  an  order  are  repre¬ 
sented  by  an  appropriate  visual  metaphor — the  mag¬ 
nitude  and  order  of  the  metaphorical  representation 
match  the  numbers.  We  can  display  data  badly  by  ignor¬ 
ing  or  distorting  this  concept. 

Rule  3 — Ignore  the  Visual  Metaphor  Altogether 

If  the  data  are  ordered  and  if  the  visual  metaphor  has 
a  natural  order,  a  bad  display  will  surely  emerge  if  you 
shuffle  the  relationship.  In  Figure  6  note  that  the  bar 
labeled  14.1  is  longer  than  the  bar  labeled  18.  Another 
method  is  to  change  the  meaning  of  the  metaphor  in  the 
middle  of  the  plot.  In  Figure  7  the  dark  shading  repre¬ 
sents  imports  on  one  side  and  exports  on  the  other.  This 
is  but  one  of  the  problems  of  this  graph;  more  serious 
still  is  the  change  of  scale.  There  is  also  a  difference  in 
the  time  scale,  but  that  is  minor.  A  common  theme  in 
Playfair's  (1786)  work  was  the  difference  between  im¬ 
ports  and  exports.  In  Figure  8,  a  200-year-old  graph 
tells  the  story  clearly.  Two  such  plots  would  have  illus¬ 
trated  the  story  surrounding  this  graph  quite  clearly. 

.  Rule  4—  Only  Order  Matters 

One  frequent  trick  is  to  use  length  as  the  visual  meta¬ 
phor  when  area  is  what  is  perceived.  This  was  used  quite 
effectively  by  The  Washington  Post  in  Figure  9.  Note 
that  this  graph  also  has  a  low  data  density  (.1),  and  its 
data-ink  ratio  is  close  to  zero.  We  can  also  calculate 
Tufte’s  (1983)  measure  of  perceptual  distortion  (PD) 
for  this  graph.  The  PD  in  this  instance  is  the  perceived 
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Figure  8.  A  plot  on  the  same  topic  done  well  two  centuries  earlier  (from  Playfair  1 786). 


Figure  9.  An  example  of  how  to  goose  up  the  effect  by  squaring 
the  eyeball  (©  1978,  The  Washington  Post). 


change  in  the  value  of  the  dollar  from  Eisenhower  to 
Carter  divided  by  the  actual  change.  I  read  and  measure 
thus: 

Actual  Measured 

1.00 -.44  ,  22.00-2.06  „  ™ 

.44  =  121  2706  ~ 9,68 

PD  =  9.68/1.27  =  7.62 

This  distortion  of  over  700%  is  substantial  but  by  tin 
means  a  record. 

A  less  distorted  view  of  these  data  is  provided  in 
Figure  10.  In  addition,  the  spacing  suggested  by  the 
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Figure  10.  The  data  in  Figure  9  as  an  unadorned  line  chart  (from 
Walner,  1980). 
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presidential  laces  is  made  explicit  on  the  time  scale. 

Rule  5 — Graph  Data  Out  of  Context 

Often  we  can  modify  the  perception  of  the  graph 
(particularly  for  time  series  data)  by  choosing  carefully 
the  interval  displayed.  A  precipitous  drop  can  disappear 
if  we  choose  a  starting  date  just  after  the  drop.  Simi¬ 
larly,  we  can  turn  slight  meanders  into  sharp  changes  by 
focusing  on  a  single  meander  and  expanding  the  scale. 
Often  the  choice  of  scale  is  arbitrary  but  can  have  pro¬ 
found  effects  on  the  pci  ception  of  the  display.  Figure  1 1 
shows  a  famous  example  in  which  President  Reagan 
gives  an  out-of-context  view  of  the  effects  of  his  tax  cut. 
The  Times’  alternative  provides  the  context  for  a  deeper 
understanding.  Simultaneously  omitting  the  context  as 
well  as  any  quantitative  scale  is  the  key  to  the  practice 
of  Ordinal  Graphics  (see  also  Rule  4).  Automatic  rules 
do  not  always  work,  and  wisdom  is  always  required. 

In  Section  3  we  discussed  three  rules  for  the  accurate 
display  of  data.  One  can  compromise  accuracy  by  ignor¬ 
ing  visual  metaphors  (Rule  3),  by  only  paying  attention 
to  the  order  of  the  numbers  and  not  their  magnitude 
(Rule  4),  or  by  showing  data  out  of  context  (Rule  5). 
We  advocated  the  use  of  Tufte's  measure  of  perceptual 
distortion  as  a  way  of  measuring  the  extent  to  which  the 
accuracy  of  the  data  has  been  compromised  by  the  dis¬ 
play.  One  can  think  of  modifications  that  would  allow  it 
to  be  applied  in  other  situations,  but  we  leave  such 
expansion  to  other  accounts. 

4.  SHOWING  DATA  CLEARLY 

In  this  section  we  discuss  methods  for  badly  dis¬ 
playing  data  that  do  not  seem  as  serious  as  those  de¬ 


scribed  previously;  that  is,  the  data  are  displayed,  and 
they  might  even  be  accurate  in  their  portrayal.  Yet  sub¬ 
tle  (and  not  so  subtle)  techniques  can  be  used  to  effec¬ 
tively  obscure  the  most  meaningful  or  interesting  as¬ 
pects  of  the  data.  It  is  more  difficult  to  provide  objective 
measures  of  presentational  clarity,  but  we  rely  on  the 
reader  to  judge  from  the  examples  presented. 

Rule  6 — Change  Scales  in  Mid-Axis 

This  is  a  powerful  technique  that  can  make  large  dif¬ 
ferences  look  small  and  make  exponential  changes  l>«>k 
linear. 

In  Figure  12  is  a  graph  that  supports  the  associated 
story  about  the  skyrocketing  circulation  of  The  New 
York  Post  compared  to  the  plummeting  Daily  News 
circulation.  The  reason  given  is  that  New  Yorkers 
“trust”  the  Post.  It  takes  a  careful  look  to  note  the 
700,000  jump  that  the  scale  makes  between  the  two 
lines. 

In  Figure  13  is  a  plot  of  physicians’  incomes  over 
time.  It  appears  to  be  linear,  with  a  slight  tapering  off 
in  recent  years.  A  careful  look  at  the  scale  shows  that  it 
starts  out  plotting  every  eight  years  and  ends  up  plotting 
yearly.  A  more  regular  scale  (in  Figure  14)  tells  quite  a 
different  story. 

The  soaraway  Post 
—  the  daily  paper 
New  Yorkers  trust 


THE  NEW  YORK  TIMES ,  SUNDAY,  AUGUST  2,  1981 


The  Neutral  View. . 
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Payments  under  lha 
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ef  130,000. 
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average;  family  income  -  *20.000 


Figure  11.  The  White  House  showing  neither  scale  nor  context 
( ©  1981,  The  New  York  Times,  reprinted  with  permission). 
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Figure  12.  Changing  scale  in  mid-axis  to  make  large  differences 
small  (©  1981,  New  York  Post). 
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Figure  13.  Changing  scale  in  mid-axis  to  make  exponential  growth 
linear  (<!?.'  The  Washington  Post). 


Rulr  7—F»>i>h<isize  the  Trivial  (ignore  the  Important) 

Sometimes  the  data  that  are  to  be  displayed  have  one 
important  aspect  m.d  others  that  are  trivial.  The  graph 
can  be  made  worse  by  emphasizing  the  trivial  part.  In 
Figure  15  we  have  a  page  I rom  Si3  that  compares  the 
income  levels  of  men  and  women  by  educational  levels. 
It  reveals  the  not  surprising  result  that  belter  educated 
individuals  are  paid  better  than  more  poorly  educated 
ones  and  that  changes  across  time  expressed  in  constant 
dollars  are  reasonably  constant.  The  comparison  of 
greatest  interest  and  current  concern,  comparing  sal¬ 
aries  between  sexes  within  education  level,  must  be 
made  clumsily  by  vertically  transposing  from  one  graph 
to  another.  It  seems  clear  that  Ruie  7  must  have  been 
operating  here,  for  it  would  have  been  easy  to  place  the 
graphs  side  by  side  and  allow  the  comparison  of  interest 
to  be  made  more  directly.  Looking  at  the  problem  from 
a  strictly  data-analytic  point  of  view,  we  note  that  there 
are  two  large  main  effects  (education  and  sex)  and  a 
small  time  effect.  This  would  have  implied  a  plot  that 


INCOMES  OF  OOCJORS  VS.  OTHER  PROFESS t ONflLS 


YFM? 

Figure  14.  Data  from  Figure  13  redone  with  linear  scale  (Irom 
Wainei  1980). 


Median  Income  of  Year-Round,  Futi-TIme 
Workers  2B  to  34  Years  Old,  by  So*  tod 
Educational  Attainment:  1G68-1077 

Coni  i  in!  1977  damn  MALE 


Figure  15.  Emphasizing  the  trivial:  Hiding  the  main  effect  ot  sex 
differences  in  income  through  the  vertical  placement  of  plots  ( from 
SI3). 


showed  the  large  effects  clearly  and  placed  the  smallish 
time  trend  into  the  background  (Figure  16). 


MEDIAN  INCOME  OF  YEAR-ROUND  FULL  TIME  WORKERS 
25-34  YEARS  OLD  BY  SEX  AND  EDUCATIONAL  ATTAINMENT: 
I96B-I977  (IN  CONSTANT  1977  DOLLARS) 


Mill* 


Fjmjles 


Median  Own 
Edutatloo  Level 


Years  ot  Eduallonsi  Attainment 
Figure  16.  Figure  15  redone  with  the  large  main  effects  empha- 
ized  and  the  small  one  (time  trends)  suppressed. 
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Figure  17.  Jiggling  the  baseline  makes  comparisons  more  difficult 
(from  Handbook  of  Agriculiural  Charts). 


Rule  8— Jiggle  the  Baseline 

Making  comparisons  is  always  aided  when  the  quan¬ 
tities  being  compared  start  from  a  common  base.  Thus 
we  can  always  make  the  graph  worse  by  starting  from 
different  bases.  Such  schemes  as  the  hanging  or  sus¬ 
pended  roptogram  and  the  residual  plot  are  meant  to 
facilitate  comparisons,  in  Figure  17  is  a  plot  of  U.S. 
imports  of  red  meat  taken  from  the  Handbook  of  Agri¬ 
culiural  Churls  published  by  the  U.S.  Department  of 
Agriculture.  Shading  beneath  each  line  is  a  convention 
that  indicates  summation,  telling  us  that  the  amount  of 
each  kind  of  meat  is  added  to  the  amounts  below  it. 
Because  of  the  dominance  of  and  the.  fluctuations  in 
importation  of  beef  and  veal,  it  is  hard  to  see  what  the 
changes  are  in  the  other  kinds  of  meat — Is  the  importa¬ 
tion  of  pork  increasing?  Decreasing?  Staying  constant? 
The  only  purpose  for  stacking  is  to  indicate  graphically 
the  total  summation.  This  is  easily  done  through  the 
addition  of  another  line  for  TOTAL.  Note  that  a 
TOTAL  will  always  he  clear  and  will  never  intersect  the 
other  lines  on  the  plot.  A  version  of  these  data  is  shown 


U.S.  IMPORTS  OF  RED  MEATS* 


*  rt  «(  «',l  lU-rt-l  I'V'v.ai  tut 


MCQMllJ'l  >*<t! 


Sott!f-i»:  ■  Il:nwlb*'«k  <>f  Agrtnilfurnl.  Cl«|rts,  U.S .  Department  of 
Agriculture,  19/0,  p.  93. 

Ch  i  r!  Smirre:  Hr  I  qlm  I 

Figure  Id.  An  alternative  version  of  Figure  17  w>th  a  straight  line 
used  as  the  basis  of  comparison. 
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Figure  19.  Austria  First!  Obscuring  the  data  structure  by  alpha¬ 
betizing  the  plot  (from  SI3). 


in  Figure  18  with  the  separate  amounts  of  each  meat,  as 
well  as  a  summation  line,  shown  dearly.  Note  how 
easily  one  can  see  the  structure  of  .import  of  each  kind 
of  meat  now  that  the  standard ;  of  comparison  is  a 
straight  line  (the  time  axis)  and  no  longer  the  import 
amount  of  those  meats  with  greater  volume. 

Rule  9— Austria  First! 

Ordering  graphs  and  tables  alphabetically  can  ob¬ 
scure  structure  in  the  data  that  would  have  been  obvious 
had  the  display  been  ordered  by  some  aspect  of  the 
data.  One  can  defend  oneself  against  criticisms  by 
pointing  out  that  alphabetizing  “aids  in  finding  entries 
of  interest.”  Of  course,  with  lists  of  modest  length  such 
aids  are  unnecessary;  with  longer  lists  the  indexing 
schemes  common  in  19th  century  statistical  atlases  pro¬ 
vide  easy  lookup  capability. 

Figure  19  is  another  graph  from  SI3  showing  life  ex¬ 
pectancies,  divided  by  sex,  in  10  industrialized  nations. 
The  order  of  presentation  is  alphabetical  (with  the 
USSR  positioned  as  Russia).  The  message  we  get  is  that 
there  is  little  variation  and  that  women  live  longer  than 
men.  Redone  as  a  stem-and-leaf  diagram  (Figure  20  is 
simply  a  reordering  of  the  data  with  spacing  propor¬ 
tional  to  the  numerical  differences),  the  magnitude  of 
the  sex  difference  leaps  out  at  us.  We  also  note  that  the 
USSR  is  an  outlier  for  men. 

Rule  10— Label  (a)  Illegibly,  (b)  Incompletely , 

(c.)  Incorrectly,  and  (dj  Ambiguously 

There  are  many  instances  of  labels  that  either  do  not 
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Figure  20.  O' doting  and  spacing  the  data  trom  Figure  19  as  a 
stemand-leal  diagram  provides  insights  previously  difficult  to 
extract  ( trom  SIS). 


tell  (he  whole  story,  tell  (he  wrong  story,  tell  two  or 
more  stories,  or  lire  so  smnil  that  one  cannot  figure  out 
what  story  they  are  telling.  One  of  my  favorite  examples 
of  small  labels  is  from  The  New  York  Times  (August 


Commission  Payments 
to  Travel  Agents 


Figure  22.  Figure  21  redrawn  with  1978  data  placed  on  a 
comparable  basis  (trom  Wainer  1980). 


1978),  in  which  the  article  complains  that  fare  cuts  lower 
commission  payments  to  travel  agents.  The  graph  (Fig¬ 
ure  21)  supports  this  view  until  one  notices  the  tiny  label 
indicating  that  the  small  bar  showing  the  decline  is  for 
just  the  first  half  of  1978.  This  omits  such  heavy  travel 
periods  as  Labor  Day,  Thanksgiving,  Christmas,  and  so 
on,  so  that  merely  doubling  the  first-half  data  is  proba¬ 
bly  not  enough.  Nevertheless,  when  this  bar  is  doubled 
(Figure  22),  we  see  that  the  agents  are  doing  very  well 
indeed  compared  to  earlier  years. 

Rule  11— More  Is  Murkier:  (a)  More  Decimal 
Places  and  (b)  More  Dimensions 

We  often  see  tables  in  which  the  number  of  decimal 
places  presented  is  far  beyond  the  number  that  can  be 
perceived  by  a  reader.  They  are  also  commonly 
presented  to  show  more  accuracy  than  is  justified.  A 
display  can  be  made  clearer  by  presenting  less.  In  Table 
1  is  a  section  of  a  table  from  Dhariyal  and  Dudewicz’s 
(1981)  JASA  paper.  The  table  entries  are  presented  to 
five  decimal  places!  In  Table  2  is  a  heavily  rounded 
version  that  shows  what  the  authors  intended  dearly.  It 
also  shows  that  the  various  columns  might  have  a  1  ub- 
stantial  redundancy  in  them  (the  maximum  expected 
gain  with  blc  =  10  is  about  l/10th  that  of  blc  -  100  and 
l/l 00th  that  of  blc  =  1,000).  if  they  do,  the  entire  table 
could  have  been  reduced  substantially. 

Just  as  increasing  the  number  of  decimal  places  can 
make  a  table  harder  to  understand,  so  can  increasing 
the  number  of  dimensions  make  a  graph  more  con- 
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Table  1.  Optimal  Selection  From  a  Finite 
Sequence  With  Sampling  Cost 


blc  =  10.0 

100.0 

1,000.0 

N  r‘  (G„(r‘)  -  a)/c  r' 

(GN(r')  -  a)/c  r* 

(Gn(C)  -  a)/c 

3 

2 

.20000 

2 

2.22500 

2 

22.47499 

4 

2 

.26333 

2 

2.88833 

2 

29.13832 

5 

2 

.32333 

3 

3.54167 

3 

35.79166 

6 

3 

.38267 

3 

4.23767 

3 

42.78764 

7 

3 

.44600 

3 

4.90100 

3 

49.45097 

8 

3 

.50743 

4 

5.57650 

4 

56.33005 

9 

3 

.56743 

4 

6.26025 

4 

63.20129 

10 

4 

.62948 

4 

6.92358 

4 

69.86462 

NOT£-  g(Xs  ■  t  1)  bR(Xs  i  t  -  1)  i  a,  if  S  -  s,  and  g(Xs  i  r  -  f)-0,  olherwise. 
Source.  Dhariyal  and  Oudowicz  { T981). 


fusing.  We  have  already  seen  how  extra  dimensions  can 
cause  ambiguity  (Is  it  length  or  area  or  volume?).  In 
addition,  human  perception  of  areas  is  inconsistent. 
Just  what  is  confusing  and  what  is  not  is  sometimes  only 
a  conjecture,  yet  a  hint  that  a  particular  configuration 
will  be  confusing  is  obtained  if  the  display  confused  the 
grapher.  Shown  in  Figure  23  is  a  plot  of  per  share  earn¬ 
ings  and  dividends  over  a  six-year  period.  We  note  (with 
some  amusement)  that  1975  is  the  side  of  a  bar — the 
third  dimension  of  this  bar  (rectangular  parallelo- 
piped?)  chart  has  confused  the  artist!  I  suspect  that  1975 
is  really  what  is  labeled  1976,  and  the  unlabeled  bar  at 
the  end  is  probably  1977.  A  simple  line  chart  with  this 
interpretation  is  shown  in  Figure  24. 

In  Section  4  we  illustrate  six  more  rules  for  displaying 
data  badly.  These  rules  fall  broadly  under  the  heading 
of  how  to  obscure  the  data.  The  techniques  mentioned 
were  to  change  the  scale  in  mid-axis,  emphasize  the 
trivial,  jiggle  the  baseline,  order  the  chart  by  a  charac¬ 
teristic  unrelated  to  the  data,  label  poorly,  and  include 
more  dimensions  or  decimal  places  than  are  justified  or 
needed.  These  methods  will  work  separately  or  in  com¬ 
bination  with  others  to  produce  graphs  and  tables  of 
little  use.  Their  common  effect  will  usually  be  to  leave 
the  reader  uninformed  about  the  points  of  interest  in 
the  data,  although  sometimes  they  will  misinform  us; 
the  physicians’  income  plot  in  Figure  13  is  a  prime  ex¬ 
ample  of  misinformation. 

Finally,  the  availability  of  color  usually  means  that 
there  are  additional  parameters  that  can  be  misused. 
The  U.S.  Census'  two-variable  color  map  is  a  wonderful 
example  of  how  using  color  in  a  graph  can  seduce  us 


Table  2.  Optimal  Selection  From  a  Finite  Sequence 
With  Sampling  Cost  (revised) 


N 

b/c 

=  10 

b/c  = 

100 

b/c 

=  1,000 

r‘ 

G 

r * 

G 

r 

Q 

3 

2 

.2 

2 

2.2 

2 

22 

4 

2 

.3 

2 

2,9 

2 

29 

5 

2 

.3 

3 

35 

3 

36 

6 

3 

.4 

3 

4.2 

3 

43 

7 

3 

.4 

3 

4.9 

3 

49 

8 

3 

.5 

4 

5.6 

4 

56 

9 

3 

.6 

4 

6.3 

4 

63 

10 

4 

.6 

4 

6.9 

4 

70 

NO  St.  g{Xs 

.  /  •  1) 

-  bn  (Xs  f  f 

-  Ij  t  a,  il  S 

ami  g  {Xs  i  /  - 

-0  =  o, 

otherwise. 

into  thinking  that  we  are  communicating  more  than  we 
are  (see  Fienberg  1979;  Wainer  and  Francolini  1980; 
Wainer  1981).  This  leads  us  to  the  last  rule. 

Rule  12 — If  It  Has  Been  Done  Well  in  the  Past,  Think  of 
Another  Way  to  Do  It 

The  two-variable  color  map  was  done  rather  well  by 
Mayr(1874),  100  years  before  the  U.S.  Census  version. 
He  used  bars  of  varying  width  and  frequency  to  accom¬ 
plish  gracefully  what  the  U.S.  Census  used  varying 
saturations  to  do  clumsily. 

A  particularly  enlightening  experience  is  to  look 
carefully  through  the  six  books  of  graphs  that  William 
Playfair  published  during  the  period  1786-1822.  One 
discovers  clear,  accurate,  and  data-laden  graphs  con¬ 
taining  many  ideas  that  are  useful  and  too  rarely  applied 
today.  In  the  course  of  preparing  this  article.  I  spent 
many  hours  looking  at  a  variety  of  attempts  to  display 


Figure  23.  An  extra  dimension  confuses  even  the  grapher 
(©  1979,  The  Washington  Post). 
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1972  197U  1976 

TERR 

Figure  24  Data  from  Figure  23  redrawn  simply  (from  Wainer 
1980). 

data.  Some  of  the  horrors  that  i  have  presented  were 
the  fruits  of  that  search.  In  addition,  jewels  sometimes 
emerged.  1  saved  the  hest  for  last,  and  will  conclude 
with  one  of  those  jewels— my  nominee  for  the  title  of 
“World’s  Champion  Graph.”  It  was  produced  by 
Minard  in  1861  and  portrays  the  devastating  losses  suf- 
feted  by  the  French  army  during  the  course  of  Napo¬ 
leon's  ill  fated  Russian  campaign  of  1812.  This  graph 
(originally  in  color)  appears  in  Figure  25  and  is  re¬ 
produced  from  Tufte’s  book  (1983,  p.  40).  His  narrative 
follows. 


Ill-ginning  ill  tin-  loll  un  the  1’ulr.h  Kiiv.mn  In i  m -m  tin 
Nicmmi  River,  (tie  thick  hand  shows  flic  size  of  ihc  army  H2.!.iHii) 
men)  as  it  invaded  Russia  in  June  1812.  The  widlli  of  the  hand 
indicates  the  size  of  the  army  al  each  place  on  the  map.  In  Sep¬ 
tember,  the  army  reached  Moscow,  which  was  try  then  sacked  and 
deserted,  with  100,000  men.  The  path  of  Napoleon’s  retreat  from 
Moscow  is  depicted  by  the  darker,  lower  band,  which  is  linked  to 
a  temperature  scale  and  dates  at  tile  bottom  of  the  chart,  ti  was  a 
bitterly  cold  winter,  and  many  froze  on  the  march  out  of  Russia. 
As  the  graphic  shows,  (he  crossing  of  the  Berezina  Rivet  was  a 
disaster,  and  the  army  finally  struggled  back  to  Poland  with  only 
10,000  men  remaining.  Also  shown  are  the  movements  of  auxiliary 
troops,  as  they  sought  to  protect  the  rear  and  flank  of  the  ad¬ 
vancing  army.  Millard’s  graphic  telis  a  rich,  coherent  story  with  its 
multivariate  data,  far  more  enlightening  than  just  a  single  number 
bouncing  along  over  time.  Six  variables  are  plotted:  the  size  of  the 
army,  its  location  on  a  two-dimensional  surface,  direction  of  the 
army's  movement,  and  temperature  on  various  dates  during  the 
retreat  from  Moscow. 

It  may  well  be  the  best  statistical  graphic  ever  drawn. 


5.  SUMMING  UP 

Although  (he  tone  of  this  presentation  tended  to  be 
light  and  pointed  in  the  wrong  direction,  the  aim  is 
serious.  There  are  many  paths  that  one  can  follow  that 
will  cause  deteriorating  quality  of  our  data  displays;  the 
12  rules  that  we  described  were  only  the  beginning. 
Nevertheless,  they  point  clearly  toward  an  outlook  that 
provides  many  hints  for  good  display.  The  measures  of 
display  described  are  interlocking.  The  data  density 
cannot  be  high  if  the  graph  is  cluttered  with  chartjunk; 
the  data-ink  ratio  grows  with  the  amount  of  data  dis¬ 
played;  perceptual  distortion  manifests  itself  most  fre- 
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qut’iiliy  when  ;i(.lilituni;il  dimensions  or  worthless  meta¬ 
phors  are  included.  Thus,  the  rules  for  good  display  are 
quite  simple.  Examine  the  data  carefully  enough  to 
know  what  they  have  to  say.  and  then  let  them  say  it 
with  a  minimum  of  adornment.  Do  this  while  following 
reasonable  regularity  practices  in  the  depiction  of  scale, 
and  label  clearly  and  fully.  Last,  and  perhaps  most  im¬ 
portant,  spend  some  time  looking  at  the  work  of  the 
masters  of  the  craft.  An  hour  spent  with  Playfair  or 
Minard  will  not  only  benefit  your  graphical  expertise 
but  will  also  be  enjoyable.  Tukcy  (1977)  offers  236 
graphs  and  little  chartjunk.  The  work  of  Francis  Walker 
(1894)  concerning  statistical  maps  is  clear  and  concise, 
and  it  is  truly  a  mystery  (hat  their  current  counterparts 
do  not  make  better  use  of  the  schema  developed  a  cen¬ 
tury  and  more  ago. 

| Receiver!  September  Revised  September  / 9X3.  | 
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ACCELERATED  LIFE  TEST:  AN  OVERVIEW 
AND  SOME  RECENT  ADVANCES  , 
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ABSTRACT*  Statistical  inferences  on  the  durability  of  a  product  may  often 
have  to  be  based  on  an  analysis  of  failure  data  generated  under  an  overstress  or 
accelerated  life  test  (ALT).  The  effectiveness  of  such  inferences  rests  heavily 
on  the  validity  of  model  assumptions  concerning  the  life  distribution  and  the 
effect  of  stress  acceleration.  In  this  article,  the  principal  methodological 
approaches  to  ALT  analysis  are  reviewed  in  light  of  plausibility  of  the  model, 
flexibility  of  empirical  fit' and  usefulness  in  practical  application.  These 
include  parametric  log-linear  models,  semi -parametric  formulations  based  on 
proportional  hazards  or  time  transformation,  and  a  reciprocal-linear  regression 
model  in  the  setting  of  a  Brownian  motion  process  for  damage  growth.  Some 
theoretical  considerations  and  practical,  issues  of  designing  an  ALT  experiment  are 
also  discussed. 

,  •  ......  .  i  ,1  .  .  ....... 

I.  INTRODUCTION.  A  problem  frequently  encountered  in  engineering  research 
and  development  is  to  ascertain.,  the  durability  or  , service  life  of  a  new  product  or 
to  compare  alternative  designs  of  the  same  product.  Usually,  long  life  of  the 
product  and  relatively  much  shorter  time  available  for.  testing  purposes  impair  our 
ability  to  collect  failure  data  by  conducting  tests  under  its  normal  conditions  of 
use.  With  accelerated  life  test  (ALT),,  prototypes  of  the  product  are  subjected  to 
stress  conditions  that  are  more  severe  than  encountered  in  normal  use  so  that  more 
failures  are  apt  to  take  place  in  a  limited  ..time.,  ,  Data  of  failure  times  under 
such  over-stress  conditions  are  then  analyzed  in  the  framework  of  a  statistical 
model, and  inferences  are  drawn  in,  regard,  to  .life  length  or  reliability  of  the 
product  under  its  normal  use  condition. 

Another  means  of  reducing  the.  test  time,  called  censored  sampling,  consists 
of  testing  a  larger  number  of  units  in  order  to  observe  a  fewer  number  of 
failures — those  that  occur  early.  Censored  life  tests  under  normal  use  conditions 
are  useful  as  long  as  failures  are  likely  to  occur  within  the  permitted  test  time. 
When  that  is  not  the  case,  ALT  is  the  only,  means  of  getting  some  failure  data.  In 
practice,  ALT  and  censoring  are  often  coupled. in  the  same  experiment  toward  the 
common  goal  of  cost  and  time,  savings. 

With  technological  advances  leading  to  enhancement  of  product  life,  ALT  is 
assuming  an  ever  increasing  role  in  engineering  experimentation.  The  last  two 
decades  have  seen  a  large  growth  of  literature  in  statistical  methodology  for  ALT 
analysis.  The  diversity  of  practical  application  has,  increased  at  the  same  time. 

A  few  examples  are:  self-lubricated  bearings  for  high  vacuum  application  (Meeks 
1980)  tested  under  high  speed  stresses,  stress-rupture  of  Kevlar-epoxy  composite 
(Glaser  1984)  under  tensile  and  temperature  stresses,  .twisted  nematic 
liquid-crystal  display  (Kitagawa  et  al  1984)  under  accelerated  voltage  stresses. 


"Research  supported  by  Office  ot  Naval  Research  uneTeT 
Grant  N00014-78-C-0722. 


253 


insulation  resistance  of  high  K  multilayer  ceramic  capacitors  (Minford  1982) 
under  voltage  and  temperature  stresses,  and  failure  of  power  cable  insulation 
(Lyle  and  Kirkland  1981)  under  temperature,  moisture  and  voltage  stresses.  The 
conduct  and  analysis  of  such  experiments  often  draw  a  great  deal  from  theoretical 
models  of  chemical  reaction,  metal  fatigue,  creep  rupture,  wear,  etc.  as  is 
relevant  to  the  particular  physical  process  of  failure,  and  are  aided  by  empirical 
evidence  and  statistical  tools.  The  subject  matter  is  heavily  interdisciplinary, 
and  accordingly,  the  relevant  literature  is  scattered  in  journals  of  several 
disciplines.  Our  discussion  will  be  limited  to  the  major  statistical  models  and 
methodology  of  ALT  analysis. 

To  introduce  the  basic  statistical  issue  of  ALT  we  let  the  random  variable 
y  represent  the  life-length  or  time-to-failure  of  a  material  specimen,  component 
or  a  system.  The  probability  distribution  of  y  depends  on  some  identifiable 
environmental  conditions  or  stresses  x  which  are  manipulated  in  the  experiment. 

Denote  by  xQ  the  normal  use-condition  stress  level.  In  an  ALT  experiment,  a 

number  of  larger  than  normal  stress  settings  x.i  i  =  l,...,k  are  chosen.  A 

sample  of  n.  units  is  subjected  to  the  constant  setting  x.  and  either  all 

i 

their  failure  times  are  observed  (full  sample)  or  only  some  early  failures  are 
recorded  (censored  sample),  i  =  l,...,k.  Thus,  samples  are  generated  from  the 
accelerated  life  distributions  F(y|x.),  i  =  l,...,k  where  F(y|x)  denotes  the 

cdf  of  y  under  the  stress  level  x.  Based  on  such  data,  one  wishes  to  make 

inferences  on  some  relevant  characteristics  of  F(y|xQ)  such  as  its  mean, 

selected  percentiles,  and  the  reliability  7(t|x J  for  a  mission  time  t  where 

T  =  l-F.  Another  variant,  called  step-stress  ALT,  allows  the  stress  setting  for 
each  unit  to  be  changed  at  specified  intervals  until  failure  occurs.  For  now  we 
confine  our  attention  to  constant  stress  ALT;  step-stress  ALT  experiments  will  be 
discussed  in  Section  5. 

A  related  area  of  research  is  survival  analysis  in  biostatistics  which  also 
deals  with  time  (survival  time,  time  to  cure  or  time  to  onset  of  a  disease)  as  the 
dependent  variable  and  its  dependence  on  such  covariates  as  age,  physiological  and 
environmental  conditions  of  the  patient.  Therefore,  between  ALT  and  survival 
analysis,  the  basic  concepts,  models  and  methods  have  much  in  common.  However, 
considerable  differences  exist  in  regard  to  the  conduct  of  the  experiment,  type  of 
data,  role  of  the  covariates  and  the  target  of  inference.  For  instance,  survival 
analysis  typically  deals  with  a  much  larger  set  of  covariates  than  is  involved  in 
an  ALT,  lesser  control  on  the  settings  of  the  covariates,  and  lesser  control  on 
the  process  data  collection  which  leads  to  more  complex  patterns  of  censoring. 
Also,  its  emphasis  is  toward  studying  the  effects  of  some  covariates  after 
adjusting  for  the  effects  of  the  others  --  not  so  much  to  predict  F(y|x  ).  In 

fact,  the  concept  of  a  normal  setting  for  the  covariates  is  not  meaningful  in 
survival  analysis.  Both  of  these  areas  can  be  brought  under  the  umbrella  name  of 


regression  analysis.  In  essence,  ALT  calls  for  regression  analysis  under 
non-standard  statistical  models  as  well  as  data  types,  and  its  major  goal  is  to 
make  predictions  beyond  the  range  of  the  experimental  setting.  In  light  of  the 
last  point,  it  is  obvious  that  theoretical  modeling  or  understanding  of  the 
failure  process  plays  a  far  more  important  role  than  empirical  model  fitting. 

Inferences  from  ALT  data  require  two  basic  ingredients  of  model 
formulation:  the  underlying  life  distribution  F(y|x)  for  a  given  stress  x, 

and  the  functional  relationship  among  these  distributions  with  varying  x.  The 

latter  is  sometimes  called  the  acceleration  function.  The  object  of  this  paper  is 
to  give  a  brief  survey  of  the  various  approaches  to  model  formulation  and  the 
associated  methods  of  statistical  inference.  To  organize  the  exposition,  we  set 
out  with  a  broad  classification  of  the  major  areas  of  development  in  ALT  analysis: 
(a)  Parametric  life  models  with  log-linear  acceleration  function,  (b)  Semi- 
parametric  approaches  based  on  hazard  rate  and  time-acceleration  models,  (c) 
Stochastic  damage  growth  models,  (d)  Special  constructs  for  step-stress  ALT,  and 
(e)  Issues  of  designing  an  ALT  experiment. 

Log-linear  (LL)  acceleration  functions  in  the  framework  of  important 
parametric  models  for  the  underlying  life  distribution  dominated  the  early 
developments  of  ALT  analysis.  An  extensive  literature  has  developed  both  in 
methodological  advances  and  diverse  applications,  A  good  survey  of  the  earlier 
developments  is  available  in  Chapter  9  of  Mann,  Schafer  and  Singpurwalla  (MSS) 
(1974).  The  proportional  hazards  model,  due  to  Cox  (1972),  is  a  semi -parametric 
formulation  that  has  been  found  instrumental  to  survival  analysis  in 
biostatistics,  and  has  led  to  major  advances  in  handling  arbitrarily  censored 
data.  Application  of  these  methods  to  ALT  is  somewhat  limited  because  the  model 
is  empirical  and  also  the  data  type  and  object  of  inference  are  different.  The 
semi-parametric  and  nonparametric  approaches  stem  from  ideas  of  greater  generality 
but  they  typically  require  larger  sample  sizes  for  sensible  inferences.  Also,  an 
extrapolation  is  less  dependable  when  it  is  based  on  a  purely  empirical 
acceleration  function.  Areas  of  relatively  recent  developments  include  (c)  and 
(d).  For  brevity,  our  discussion  in  Sections  2-5  will  focus  on  the  motivation  and 
description  of  the  various  models  and  will  include  only  an  outline  of  the 
principal  analytical  methods.  Technical  details  as  well  as  treatment  of  special 
cases  under  each  class  of  models  will  be  omitted  with  references  provided  for  the 
interested  reader.  Section  6  deals  with  designing  an  ALT  experiment  and  discusses 
the  usefulness  of  some  optimality  criteria. 

2.  PARAMETRIC  LOG-LINEAR  MODELS.  A  general  formulation,  called  parametric 
log-linear  (LL)  model,  consists  of  the  following  assumptions:  (a)  the  underlying 
life  distribution  belongs  to  a  specified  parametric  family  involving  a  scale 
parameter  e  and  possibly  also  a  shape  parameter  n,  (b)  the  scale  parameter 
depends  on  the  stress  x  according  to  an  LL-relation  log©  =  P'x  while  n  is 

is  a  constant  independent  of  x.  Here  x  is  a  p-vector  whose  components  need 

not  correspond  to  all  distinct  stress  variables,  some  may  be  just  different 
functions  of  the  same  variable.  For  instance,  with  temperature  as  the  sole  stress 
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variable  x,  the  quadratic  function  Pq+^x+^x  satisfies  this  formulation  with 
x'  =  (l,x,x2)  and  3‘  =  (Pp.BpPg). 

The  choice  of  a  life  distribution  is  guided  by  such  criteria  as  its 
theoretical  basis  in  reliability,  simplicity  of  inference  procedures  and 
flexibility  of  empirical  fit.  Distributions  derived  from  Poisson  shocks,  extreme 
value  theory,  failure  rate  behavior  or  those  with  good  track  record  in  fitting  to 
life  data  are  the  natural  candidates.  These  include  the  exponential,  Weibull, 
gamma  and  lognormal  distributions.  The  assumption  of  an  LL  relation  to  stress  is 
not  only  simple  and  flexible  but  is  also  motivated  in  many  practical  contexts  from 
theoretical  constructs  based  on  chemical  kinetics,  activation  energy,  principles 
of  quantum  mechanics,  etc.  The  Arrhenius  reaction  rate  model.  Inverse  power  law, 
Eyring  model,  and  Generalized  Eyrlng  model  are  some  ot  the  wfdely  used  engineering 
models  which  fit  into  the  LL  formulation.  These  are  respectively  given  by 

6  =  exp{A-B/x)»  temperature  stress 

e  =  (A/x)P  >  voltage  stress 

(2.1) 

6  =  x  exp(A-B/x),  temperature  stress 

0  =  Ax1exp{-B/x1)exp(Cx2+Dx2/x1),  temperature  and  voltage  stresses. 

Statistical  inferences  including  estimation  of  the  model  parameters  and 
setting  confidence  bounds  for  the  mean  life  or  a  specified  percentile  of  the  life 
distribution  at  use  condition  stress  as  well  as  model  checking  and  goodness-of- 
fit  are  extensively  treated  in  the  literature  under  various  distributional 
assumptions  and  specific  engineering  models.  One  general  body  of  methodology  is 
based  on  the  maximum  likelihood  (ML.)  estimation,  the  Fisher  information  matrix  and 
the  associated  asymptotic  normal  approximation.  The  technical  details  vary 
according  to  the  specific  models  and  data  types,  and  the  plethora  of  results  are 
beyond  the  scope  of  this  brief  survey.  The  reader  may  refer  to  Chapter  9  of  MSS 
(1974)  for  some  details  and  also  the  relevant  references. 

In  general,  the  maximum  likelihood  method  in  the  ALT  context  and  especially 
with  censored  data  involves  considerable  computational  complexity,  and  lacks  a 
grip  on  the  small  sample  properties  of  the  estimators.  Some  interesting 
alternative  procedures  have  been  developed  for  the  case  of  location-scale 
parameter  families  for  the  distribution  of  the  log-life.  In  particular,  the 
logarithm  of  Weibull  and  lognormal  random  variables  have  the  Gumbel  extreme  value 
and  normal  distributions,  respectively,  each  of  which  constitutes  a  location- 
scale  family. 

A  simple  estimation  procedure  with  type  II  censored  data,  proposed  by 
Nelson  and  Hahn  (1972,  1973),  is  based  on  an  application  of  the  least  squares 
method  in  two  stages.  To  outline  the  idea  we  consider  a  p-vector  x  of  stress 

variables  with  k  settings  Xj,...^.  At  x^,  n^  units  are  simultaneously 

tested  and  observed  till  the  rn-th  failure  occurs  so  for  each  i  =  l,...,k  we  have 
a  type  II  right  censored  sample  y..^  <  y.-2  <  •••  <  y^r  .  With  a  minor  misuse 
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of  notation,  here  we  take  y  to  be  the  log-life  so  the  censored  sample  comes  from 

a  pdf  of  the  form  <f V<y- V/o]  where  Xi  ‘  5V  3  15  9  c0,”pletely  specif,ed 

pdf  (standard  extreme-value,  normal,  etc.),  and  3  and  o  are  the  unknown 

parameters.  For  simplicity,  we  confine  further  discussion  to  equal  sample 
sizes  and  equal  censoring,  that  is,  n.j  =  n  and  r.  =  r  for  all  i. 

In  staqe  1  we  ignore  the  regression  structure  and  estimate  the  parameters 
(X.,o)  from  the  ith  data  set  by  the  method  of  least  squares  applied  to  the  linear 

model 


v.,  =  X.  +  cV..,  j  =  l,...,r  (2.2) 

yiJ  i 

where  V.  j  =  l,...,r  are  the  first  r  order  statistics  of  a  sample  of  size 
n  from  the  standardized  pdf  g.  Their  means  Cj  =  E{ Vi and  covariances 
a..,  =  cov(V . -,V.  . i )  are  known  constants,  and  their  tables  are  available  for 

jj 

some  distributions.  We  thus  have  the  stage-1  best  linear  unbiased  estimators 
(BLUE)  of  the  form 


* 

Ai  - 


r 

£  a  .y  ^  , 
j=l  J  1J 


0i  = 


5  Vi, 

j=l  J  1J 


(2.3) 


as  well  as  their  exact  covariance  matrix 

2 


dj  d3 


d3  d2 


where  dg  ^3  known  constants. 


(2.4) 


In  stage  2,  we  denote  X*  =  (Xj^ . *k)'»  °  ~  ( •  •  •  •‘V  »  an<1  form 

the  linear  model 


=  XB  +  ex 
=  lo  +  eQ 


(2.5) 


where  X*  =  (xx . xk),  and  the  pair  (e^)  has  mean  (0,0),  its  elements  are 

independent  across  rows  and  have  the  covariancestructure  (2.4)  across  columns. 
Based  on  this  linear  model,  the  BLUE's  are  obtained  as 


3  =  (x,x)”1x*  x*. 


~  1  K  * 

a  =  —  E  (J. 

k  i=1  1 


(2.6) 
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The  mean  log-life  at  the  use  condition  stress  xQ  as  well  as  any 

percentile  is  of  the  form  P'x_  +  co  which  is  a  linear  combination  of  3  and  a. 

Therefore,  (2.6)  leads  to  unbiased  estimators  of  these  quantities  as  well  as  their 
exact  variances  as  opposed  to  only  asymptotic  results  obtainable  for  the  MLE's. 
However,  to  construct  confidence  bounds,  one  has  to  resort  to  large  sample  normal 
approximation  of  these  estimators  except  for  some  isolated  simple  models  where  an 
exact  pivotal  method  may  be  feasible,  cf.  McCool  (1980). 

Bhattacharyya  and  Soejoeti  (1981)  examine  conditions  on  the  design  matrix 
X  and  the  underlying  log-life  distribution  g  for  the  asymptotic  normality  of  the 


ML  and  two-stage  least  squares  estimators,  and  investigate  the  loss  of  asymptotic 
(k  +  «)  efficency  incurred  by  the  latter.  In  particular,  for  the  Weibull  life 
distribution,  it  is  found  that  a  fairly  high  efficiency  is  retained  unless  either 
n  is  too  small  or  r  is  too  small  compared  to  n.  Nelson  (1970)  discusses 
another  two-stage  estimation  method  where  MLE  is  used  in  the  first  stage  followed 
by  least  squares  in'  the  second  but  one  loses  the  exact  properties  (unbiasedness, 
variances  and  covariances)  in  this  process. 


For  the  lognormal  life  model  and  type  II  right  censored  ALT  data,  Mehrotra 
and  Bhattacharyya  (1985)  develop  another  simple  and  highly  efficient  estimation 
procedure  using  a  judicious  modification  of  the  likelihood  equations.  Denoting 

k 


yi  =  <»11 . Y  ■  r  “  z1j  =  YfiiV'0’ 

they  observe  that  the  likelihood  function  is  a  product  of  the  two  components 


Lj  =  0"rexp[-(y-X8)'(y-X8)/(2o2)] 

(2.7) 

k  nrr. 

L?  -  H  Cl-«(ztr  )] 

6  1-1  i 


where  X  is  now  the  r  *  p  matrix  whose  rows  are  x|,...,x^  repeated 
rl»*.*,rk  times>  respectively,  and  $  denotes  the  standard  normal  cdf.  The  factor 
L1  has  the  form  of  a  full  sample  normal  regression  likelihood  based  on  the  sample 
sizes  rj,...,rk  at  the  k  design  points.  Complication  in  obtaining  the  MLE 
arises  because  of  L A  method  of  modified  MLE  is  proposed  by  replacing 
31 ogLw/ s e  and  31ogL0/3a  by  their  respective  expectations  in  the  likelihood 

L  ~  c. 

equations.  It  turns  out  that  these  modified  likelihood  equations  lead  to  the 
exact  solutions 

5  =  rt'y  -  crS_1a 
f  =  c^y'U-xs'Vjy 


(2.8) 


where  S  =  X'X,  and  the  constants  a  and'  c  can  be  calculated  by  using  the 

tables  of  means  and  variances  of  the  standard  normal  order  statistics.  Closed 
form  expressions,  easy  computing  algorithm,  some  exact  small  sample  properties, 
and  little  loss  of  asymptotic  efficiency  withlightcensoring  are  the  principal 
advantages  with  this  method.  A  few  other  modifications  of  the  likelihood 
equations  to  obtain  estimators  in  closed  forms  are  discussed  by  Tiku  (1978)  and 
Schneider  (1984)  for  censored  normal  samples. 

In  most  applications  of  the  parametric  LL  analysis,  the  shape  parameter  n 
is  assumed  to  be  independent  of  x.  Glaser  (1984)  employes  a  more  general 

formulation  with  the  Weibull  distribution  assuming  that  the  reciprocal  of  the 
shape  parameter  also  has  a  linear  model  in  terms  of  x.  Iterative  solution  of  the 

ML  equations  are  discussed  in  the  settings  of  grouped  and  censored  ALT  data. 

Shaked  (1978)  discusses  ML  estimation  with  the  inverse  power  law  and  Arrhenius 
acceleration  functions  applied  to  some  linear  hazard  rate  type  distributions. 

3.  SEMI-PARAMETRIC  MODELS— PROPORTIONAL  HAZARDS  AND  TIME  ACCELERATION. 

3.1  Proportional  Hazards  Model.  The  LL  model  discussed  in  the  preceding 
section  envisons  a  multiplicative  effect  of  stress  on  the  scale  parameter  and 
hence  on  the  mean  as  well  as  the  percentiles  of  the  life  distribution..  Another 
approach  to  modeling  the  effect  of  stress  focuses  on  the  failure  rate  behavior. 

The  failure  rate  at  age  y  of  a  unit  undergoing  a  constant  stress  x  is  defined 

as  h(y|x)  =  f (y |x)/F(y |x)  where  f  and  "F  are  respectively  the  pdf  and 

survival  function  of  the  life  distribution.  Let  hQ(y)  =  h(y|xQ)  denote  the 

failure  rate  function  under  the  use  condition  stress  xQ.  The  proportional 

hazards  (PH)  model  assumes  that  stress  acts  mul tipi icatively  on  the  failure  rate, 
that  is  h(y|x)  =  hQ(y)g(x,3)  where  g  is  a  positive  function  involving  an 

unknown  parameter  vector  3  but  is  free  of  y.  Cox  (1972)  proposed  this  idea 

and  further  assumed  an  exponential  form  of  g, 

h(y  |x)  =  h,.  (y)exp(  3'x)  (3.1) 

arguing  that  this  choice  is  "convenient,  flexible  and  yet  entirely  empirical". 

The  model  is  semi -parametric  because  one  component,  namely,  the  acceleration 
function  is  parameterized  while  the  form  of  the  use  condition  hazard  hQ(y)  is 
left  completely  arbitrary. 

The  PH  model  has  spurred  extensive  research  in  statistical  methodology  with 
applications  targeted  mainly  to  survival  analysis  in  biostatistics.  Also, 
handling  arbitrary  or  randomly  censored  data  has  been  a  focal  point  of  these 
developments.  The  parameter  3  is  usually  viewed  as  the  primary  target  of 

inference  while  hQ(y)  is  considered  a  nuisance  function.  In  the  context  of  ALT, 
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hQ(y)  or  the  corresponding  life  distribution  F(y|xQ)  is  of  main  interest  while 

an  assessment  of  the  significance  of  the  stress  effects  is  often  redundant.  More 
importantly,  the  use  of  an  empirical  acceleration  function  with  no  physical 
back-up  is  prone  to  criticism  because  this  function  plays  a  dominant  role  in 
extrapolation  and  inferences  on  hQ(y). 

A  comparison  of  the  structures  of  the  LL  and  PH  models  is  in  order  here. 

The  well  known  relations  between  the  failure  rate,  cumulative  hazard  and  survival 
functions  (cf  Kalbfleisch  and  Prentice  1980)  lead  to  the  following  equivalent 
forms  of  the  PH  model : 

_  _  exp(P'x) 

_F(y|x>  -  [F(y|xon  -  -  (3  2) 

log[-logF(y  (x) ]  =  log[-logF(y JxQ) 3+B'x. 

The  second  equation  shows  a  linear  model  in  regard  to  the  influence  of  the 
stresses  operates  additively  on  the  log  (-log)-survival  function.  By  contrast, 
the  LL  model  assumes  a  linear  form  for  the  logarithm  of  the  scale  parameter,  and 
is  therefore  physically  more  meaningful.  It  entails  that  y|x  has  the  same 

distribution  as  that  of  (y jx  ) [exp( B'x) ],  and  this  relation  leads  to  the  failure 
rate  relation 


h(y|x)  =  exp(3'x)h0[y  exP< !'*)]• 


(3.3) 


£ 

Obviously,  the  LL  and  PH  models  coincide  if  and  only  if  hQ(y)  «  y  ,  that  is,  the 
underlying  life  distribution  is  Heibull. 

A  more  general  class  of  models  is  formulated  by  Ciampi  and  Etezadi-Amol i 
(1985)  by  embedding  both  LL  and  PH  failure  rate  functions  in  a  common  frame: 

h(y(x)  =  exp(a'x)h  [y  exp(B'x)].  (3.4) 


This  reduces  to  LL  if  a  =  3  and  to  PH  if  3  =  0.  They  study  asymptotic 

M  IV  « 

likelihood  ratio  tests  for  model  discrimination  under  the  further  assumption  that 
h  is  a  polynomial.  It  is  not  clear  if  such  an  over-parameterization  is 

necessary  or  meaningful  in  ALT  analysis.  The  model  being  purely  empirical,  its 
use  in  ALT  is  questionable. 

3.2  Time-Acceleration  Model.  The  concept  of  a  failure-time  acceleration 
or  shortening  of  the  life-time  under  increased  stress  has  prevailed  in  much  of  the 
historical  developments  of  the  ALT  models.  A  simple  formulation  was  advanced  by 
Allen  (1959)  and  its  ramifications  treated  later  by  several  authors.  To  introduce 
the  basic  idea,  suppose  FQ(y )  and  G(y)  denote  the  survival  functions  under  the 

use  condition  stress  and  an  accelerated  stress  condition,  respectively.  A 
relation  between  them  is  modeled  as  IT(y)  =  FQ[v(y)]  with  a  "time-acceleration" 

function  v(y).  Allen  (1959)  calls  it  a  strict  acceleration  if  v(y)  >y  for 
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all  y  (it  is  understood  that  v(y)  is  not  identically  equal  to  y),  and  a 
restricted  acceleration  if  v(y)  <  y  holds  on  a  finite  interval  and  v(y)  >  y 
on  an  infinite  interval.  Note  that  a  strict  acceleration  is  equivalent  to  the  use 
condition  life  being  stochastically  larger  than  that  under  the  accelerated  stress. 
Barlow  and  Scheuer  (1971)  considered  nonparametric  estimation  of  F  and  G  under 
the  assumptions  that  both  are  IFRA  distributions,  v(t)  is  arbitrary,  and  data  are 
available  from  both  F0  and  G. 

Lacking  data  from  F  ,  as  is  usually  the  case  with  ALT  experiments,  one 
must  specify  a  structure  of  v(t)  to  be  able  to  estimate  FQ.  A  semi -parametric 

formulation,  proposed  by  Shaked,  Zimmer  and  Ball  (SZB)  (1979),  assumes  that  the 
stress  x  acts  on  the  survival  by  means  of  a  change  of  the  time  scale, 

v(y)  =  g(x,B)y 


where  g  is  a  specified  function  of  x  involving  an  unknown  parameter  B, 
and  the  distribution  F  (y)  is  arbitrary.  Note  that  the  choice  g(x,B)  = 
exp(B'x)  leads  to  the  structure  of  the  LL  model  of  Section  2,  the  sole  difference 
being  that  F  (y)  is  left  nonparametric  in  the  present  formulation. 

Consider  the  case  of  a  single  stress  variable  x  and  a  scalar  parameter 
B.  Suppose  that  k  accelerated  stress  settings  x^  are  used,  n.  units  are 

tested  at  x.  and  all  failure  times  y^,  j  =  l,...,n_j,  i  =  l,...,k  are 

observed.  The  model  entails  that  y | x^  has  the  same  distribution  as  -  * ( y i x - 1 ) 

where  0^.,  =  g(x^,  B)'/g(x^  i , B).  Based  on  this  observation,  SZB  (1979) , propose  a 

simple  inference  procedure  along  the  following  steps: 

(i)  Using  the  data  from  each  pair  of  stress  settings  (x^x^i), 

A 

obtain  a  consistent  estimator  e^.i  of  the  ratio  of  scales 


-  _  _  _1  N. 

such  as  0^.,  =  yi/yi.  where  y.  =  ni  r  y^. 

J  1 

A 

(ii)  Obtain  B..  ■ ,  by  solving  the  equation 

A 

e^-.  =  g(x1-,B)/g(x1.  «,B). 

Repeating  this  for  all  pairs  get  k(k-l)/2  estimators  of  B. 


(iii)  Form  the  pooled  estimator  B  =  £  w..,B..,  using  the 

1 <i <i ‘ <k  11  11 

weights  . ,  inversely  proportional  to  the  asymptotic 

A 

variances  of  B .  • » . 
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(iv)  Rescale  the  observed  failure  times  to  pseudo-values  at  the 
use  condition  stress: 


g(xQ, e) 
g(x1.,3) 


l,...,n,. ,  i  1» 


(v)  Act  as  if  these  pseudo-values  constitute  a  random  sample  of 
k 

size  N  =  £  n-  from  the  distribution  FQ(y )  in  order  to 

i=l 

estimate  the  mean,  percentiles  or  other  features  of  FQ  or 
even  the  whole  function  FQ(y). 

Shaked  and  Singpurwalla  (1982)  discuss  goodness-of-fit  tests  along  these  lines. 

The  appealing  features  of  the  above  procedure  are  its  simplicity  which  is  an. 
attraction  to  the  practitioners,  and  avoidance  of  the  assumption  of  a  specific 
parametric  life  distribution  as  is  involved  in  the  LL  analysis.  However,  large 
sample  sizes  are  needed  for  its  validity,  and  that  is  in  essence  a  price  to  be  paid 
to  forego  a  parametric  assumption.  Like  the  LL  model  it  does  have  a  parametric 
assumption  for  the  acceleration  function  and  that  plays  a  crucial  role  in  extrapo¬ 
lation.  In  light  of  this,  whether  one  chooses  a  flexible  parametric  family  for 
F  (y)  or  leaves  it  nonparametric  is  not  of  much  practical  import  in  model  fitting 

and  inference. 

Proschan  and  Singpurwalla  (1979,  1980)  discuss  a  Bayesian  approach  which 
circumvents  the  need  for  choosing  a  specific  parametric  acceleration  function  as 
well  as  the  form  of  the  life  distribution.  However,  they  assume  that  prior 
information  in  regard  to  the  average  failure  rates  over  disjoint  time  intervals 
under  each  accelerated  condition  is  available,  and  that  least  squares  fit  of  a 
linear  relation  among  the  posterior  average  failure  rates  can  be  extended  to  the 
use  condition  stress. 

4.  STOCHASTIC  DAMAGE  GROWTH  —  AN  INVERSE  GAUSSIAN  REGRESSION  MODEL.  In 
this  section,  we  discuss  a  parametric  approach  based  on  a  life  distribution  which 
derives  from  a  stochastic  model  of  fatigue  or  growth  of  damage  in  a  material.  In 
contrast  with  direct  modeling  of  the  time-acceleration  function  or  the  failure  rate 
behavior  discussed  in  the  previous  sections,  here  the  rate  parameter  of  the  damage 
growth  process  is  modeled  in  relation  to  the  stress. 

Specifically,  we  assume  that  given  a  constant  operating  environment, 
depletion  of  strength  or  growth  of  damage  of  a  material  specimen  over  time  follows 

2 

a  Brownian  motion  process  with  drift  u  >  0  and  diffusion  constant  6  ,  and  that 
the  material  fails  when  the  accumulated  damage  exceeds  a  critical  level  w  >  0. 

Let  X ( t )  denote  the  accumulated  damage  during  the  time  interval  CO, t] .  The 
time-to-failure  is  then  given  by  y  =  i n f { t :  X(t)  >  w}  which  is  the  first  passage 
time  of  the  process  across  oj.  The  above  assumptions  lead  to  the  following  pdf  of 

y: 


f (y )  =  (2Troy3)“1/2exp[-(- -l)2/(2ay)],  0  <  y  < 

0 


00 


(4.1) 


2  2  3 

where  0  =  w/y,  0  =  <$  /to  ,  mean  =  0,  and  variance  =  0  a.  This  distribution  is 
known  as  a  Gaussian  first  passage  time  distribution  in  the  stochastic  processes 
literature,  and  is  more  commonly  called  the  inverse  Gaussian  distribution, 
IG(9,cr),  in  the  statistical  literature.  Its  analogy  with  and  advantages  over  the 
Birnbaum-Saunders  (1969)  fatigue  life  distribution  is  discussed  by  Bhattacharyya 
and  Fries  (BJ)  (1982b). 

In  the  context  of  ALT,  the  parameter  y,  which  represents  the  mean  damage 
growth  per  unit  of  time,  is  the  natural  choice  for  constructing  an  acceleration 
function  in  relation  to  the  stress  x.  A  simple  and  flexible  formulation  due  to 

BF  (1982a,  1986)  postulates  a  linear  regression  model  for  y  and  assumes  to  and 

2 

6  to  be  constants  independent  of  x.  The  latter  assumption  is  in  the  spirit  of 

the  homoscedasticity  assumption  in  the  normal  theory  regression  analysis.  Thus, 
the  distribution  of  the  failure  time  under  stress  x,  y|x,  is  taken  to  be 

IG(9(x),o)  whose  mean  0(x)  depends  on  the  stress  x  (a  p-vector)  according  to 

the  reciprocal-linear  model  0  1(x)  =  8'x,  and  0  is  independent  of  x. 


To  discuss  statistical  inferences  with  the  above  model,  we  consider  an  ALT 
experiment  with  k  settings  of  x,  and  a  random  sample  of  n^  failure  times  y..^, 

j  =  1, . ..  ,n.  observed  at  the  setting  i  =  l,...,k.  Let  N,  yf,  y  respectively 

denote  the  total  sample  size,  the  ith  sample  mean  and  the  grand  mean, 

r  =  n  SEy..,  the  grand  mean  of  reciprocals  of  the  observations,  V  =  £Z{y..  -y  ), 
ij  ij  1J 

the  total  reciprocal  deviation,  and  define  the  matrices 

D  =  diag(y1,...,yk)  ,  C  =  diag(nr . . . ,nk) 

X'  =  (x1,...,x.)  ,  S  =  X'CDX. 

Referring  to  (4.1)  and  the  regression  model  0.1  =  x'3,  the  likelihood  function 

I  ~  j  « 

L  can  be  written  in  the  form 

L  -  0_N/2expC-  —  Q(B)]  (4.2)  . 

2o 

where  , 

Q(B)  =  (DX0-1) ’CD"i(DX3-l)  +  V  .  (4.3) 

W  IVMK  IV  N  M  M  M 

From  (4.2)  and  (4.3),  BF  (1986)  show  that  the  unique  roots  of  the  likelihood 
equations, 

i  =  S_1£n.x.  ,  a  =  N-1Q(B)  (4.4) 

~  ~  J  ~  1  ~ 
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provide  efficient  likelihood  estimators,  that  is,  they  are  consistent, 
asymptotically  normal  and  asymptotically  equivalent  to  the  MLE's.  They  further 
exploit  some  convenient  features  of  the  likelihood  function  to  arrive  at  an 
analysis  of  reciprocals  (ANOR)  table  along  the  ideas  of  the  ana]ysis  of  variance 
table  in  the  normal  theory  linear  model  analysis.  The  ANOR  table  rests  on  the 
decomposition  of  the  total  corrected  sum  of  reciprocals 

V  -  0Reg  +  QL  +  QE 

where  the  components  on  the  righthand  side  measure  the  contributions  due  to 
regression,  lack  of  fit, and  pure  error,  respectively,  and  are  given  by 

QReg  -  NIT5-V'1) 

<>1.  ■  jW  - 

Q  =  zUy'l1.  -  yT1}  . 
ij 

Consideration  of  likelihood  ratio  tests  along  with  a  judicious  intermix  of  exact 
distribution  theory  of  IG  and  asymptotic  theory  further  lead  to  approximate  F 
tests  for  the  relevant  hypotheses. 

Other  developments  in  the  area  of  IG  reciprocal  linear  model  include: 
construction  of  standardized  IG  residuals  and  their  plots  for  a  graphical  model 
checking,  construction  of  unbiased  estimators  via  least  squares  applied  to  the 
reciprocals  (BF  1982b),  determination  of  optimal  designs  by  minimizing  a  finite 
sample  version  of  the  asymptotic  generalized  variance  (Fries  and  Bhattacharyya 
(FB)  1986),  and  analysis  of  factorial  life  test  experiments  (FB  1983). 

The  method  of  ALT  analysis  discussed  in  this  section  rests  on  a  parametric 
formulation  much  in  line  with  the  model  presented  in  Section  2.  The  IG 
distribution  as  a  life  model  has  a  sound  theoretical  basis,  and  the  family  is 
flexible  enough  to  fit  most  real  life  data  just  as  the  lognormal  and  Weibull 
families.  Moreover,  the  reciprocal  linear  model  as  an  acceleration  function 
derives  from  a  plausible  assumption  about  the  damage  caused  by  stress.  Taken 
together,  the  methodology  of  this  section  has  several  desirable  features:  a 
physical  basis  of  the  model,  flexibility  of  empirical  fit,  tractability  of 
statistical  inferences  and  availability  of  model  checking  procedures.  However, 
simple  methods  of  statistical  inferences  with  censored  data  are  still  not 
available  for  this  model  and  further  work  in  this  direction  is  needed. 

5.  STEP-STRESS  ALT.  The  preceding  sections  were  concerned  with  the  ALT 
studies  where  each  unit  is  subjected  to  a  constant  level  of  stress  until  failure 
occurs  or  the  observation  is  censored.  Another  widely  used  method  of  conducting 
an  ALT  experiment,  called  a  step-stress  ALT,  allows  the  stress  setting  of  a  unit 
to  be  changed  at  discrete  points  of  time.  Stress  changes  may  be  effected  at 
preset  times  or  upon  occurrence  of  a  fixed  number  of  failures  along  the  ideas  of 
type  I  and  type  II  censoring,  respectively.  Applications  of  step-stress  ALT  are 
cited  by  Nelson  (1980),  Bora  (1979)  and  Miller  and  Nelson  (1983)  in  the  contexts 
of  failure  of  cable  insulation  under  voltage  stress,  life  testing  of  diodes,  and 
dielectric  breakdown  of  insulating  fluid,  respectively. 
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In  an  ordinary  fixed-time  step-stress  experiment,  a  random  sample  of_  N 
units  are  simultaneously  exposed  to  a  stress  setting  Xp  observed  over  a  fixed 

time  tj  and  the  failure  times  of  those  failing  in  this  interval  are  recorded. 

At  time  t p  the  surviving  units  are  subjected  to  a  different  stress  setting  x2 

and  observed  till  they  all  fail.  Such  an  experiment  is  called  a  two-step  or 
simple  step-stress  ALT.  The  idea  extends  to  more  than  two  steps  in  an  obvious 
way.  Moreover,  the  failure  observations  at  the  terminal  step  may  be  censored  at  a 
fixed  time.  The  intent  of  such  an  experiment  is  to  collect  more  failure  time  data 
in  a  limited  time  horizon  without  necessarily  using  high  stresses  to  all  the 
units.  With  an  initial  low  stress,  a  unit  may  tend  to  survive  too  long  in  which 
case  observation  of  its  actual  failure  time  would  be  lost  due  to  censoring.  That 
can  be  prevented  by  increasing  the  stress  at  an  intermediate  point  thus  increasing 
the  chance  of  an  early  failure.  In  principle,  an  initial  high  stress  can  be 
followed  by  a  lower  one  in  the  second  step  but  the  motivation  of  using  this 
pattern  is  not  transparent. 

As  with  a  constant  stress  experiment,  the  goal  of  statistical  analysis  of 
step-stress  ALT  data  is  to  draw  inferences  on  F  (y)  -  F(y|xQ),  the  life 

distribution  corresponding  to  the  constant  use  condition  stress  xQ.  For  this 

to  be  possible,  we  must  have  a  model  that  relates  the  step-stress  life 
distribution  to  the  constant  stress  life  distribution  F  (y).  A  sensible 

formulation,  called  a  cumulative  exposure  ( CE )  model,  was  proposed  by  Nelson 
(1980).  It  assumes  that  “the"  remaining  "life  of  specimens  depends  only  on  the 
current  cumulative  fraction  failed  and  current  stress  --  regardless  how  the 
fraction  accumulated.  Moreover,  if  held  at  the  current  stress,  survivors  will 
fail  according  to  the  cdf  for  that  stress  but  starting  at  the  previously 
accumulated  fraction  failed."  To  formalize  this  idea,  we  let  F.(y)  stand  for 


F (y | x . ) ,  the  life  distribution  under  the  constant  stress  x.,  and  let  G(y) 

denote  the  life  distribution  under  a  two-step  (first  x^  and  then  x2)  stress. 
The  CE  model  entails  that 


where  s1  is 
Fj.  At  time 
Fi ( tj ) -  Thus 
distributions 


G(y)  =  Fi (y)  for  y  <  t, 

1  (5.1) 

=  F2(s1+y-t1)  for  tj  <  y  <  « 

the  solution  of  F2(s^)  =  F^t^).  Initially,  G  is  the  same  as 
tp  it  switches  to  the  function  F2  but  starting  with  the  value 
G(y)  is  made  up  of  segments  of  the  constant  stress  life 
F^  and  F2,  pieced  together  at  the  change  point  of  stress.  Note 


that  this  formulation  is  different  from  the  mixture  models  as  well  as  the  change 
point  models  that  appear  in  some  areas  of  the  statistical  literature. 
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With  the  general  formulation  (5.1),  a  parametric  model  can  be  constructed 
by  taking  F.  and  F?  to  be  members  of  a  common  parametric  family  along  with  an 

LL  model  of  relation  between  them.  For  example,  use  of  the  Weibull  model 

r(y|x)  =  exp[-(y/ ©( x) ) in  conjunction  with  the  inverse  power  law  e(x)  = 

(A/x)P  and  equation  (5.1)  leads  to  the  step-stress  life  distribution 

-Cy(x/A)pl3 


G(y) 


e  ,  0  <  y  <  tx 

-[t1(x1/A)p+(y-t1) (x2/A)P]P 
e  » 


tj  <  y  <  °° 


(5.2) 


where  A,  P  and  B  are  unknown  parameters.  Nelson  (1980)  and  Miller  and  Nelson 
(1983)  discuss  maximum  likelihood  estimation  under  this  type  of  parametric  models 
where  the  underlying  life  distribution  is  taken  to  be  exponential  or  Weibull,  and 
the  acceleration  function  either  Arrhenius  or  the  inverse  power  law.  They  also 
illustrate  application  to  data  of  some  step-stress  ALT  experiments. 


A  physical  basis  of  the  CE  model  in  step-stress  ALT  is  not  as  transparent 
as  its  mathematical  formulation.  Earlier,  in  a  similar  context,  DeGroot  and  Goel 
(1979)  advanced  a  time-acceleration  model  which  is  physically  more  meaningful. 
They  assume  that  the  effect  of  switching  the  stress  from  x^  to  x2  is  to 

multiply  the  remaining  life  of  the  unit  by  some  unknown  factor  a,  a  function  of 
x.  and  x,  (a  <  1  if  x~  is  more  severe  than  x^.  Letting  yj  denote  the 

life-length  under  the  constant  stress  x^^  and  y*  that  under  the  step-stress 

pattern  (switching  from  ^  to  x2  at  time  t^,  they  formulate  the  relation 


=  tj  +  alyj-tj)  if  tj  <  yt 

and  call  v*  a  tampered  random  variable.  It  can  be  seen  that  (5.3)  becomes  a  . 
special  case  of  "(5.1)  TF  Fx  and  F2  differ  only  by  a  scale  parameter.  In  this 

sense,  (5.1)  accommodates  a  more  general  formulation  by  allowing  other  parameters 
of  the  life  distribution  to  change  with  stress,  although  such  a  generalization 
obscures  the  physical  meaning  of  the  model  and  in  none  of  the  applications  it  has 
been  used  as  yet.  DeGroot  and  Goel  (1979)  only  consider  the  setting  of  a 
"partially  accelerated  life  test"  viewing  Xj^  as  the  use  condition  stress  and 

x  the  single  accelerated  stress  so  a  specification  of  the  acceleration  function, 

relating  a  to  x,  is  not  necessary.  On  the  other  hand,  they  allow  to  be 

different  for  different  units.  Considering  the  underlying  life  distribution  to  be 
exponential,  they  study  the  issue  of  optimal  design  in  the  framework  of  Bayesian 
decision  theory  along  with  the  specification  of  some  cost  function.  Goel  (1975) 
discusses  the  asymptotic  properties  of  MLE  in  the  above  setting. 
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Curiously,  with  the  assumption  of  an  exponential  distribution  but  without 
any  reference  to  ALT,  the  above  model  also  appears  in  the  literature  under  a 
different  name  --  a  change  point  hazard  model.  The  formulation,  which  is  in  terms 
of  the  failure  rate  function  is 


h(y)  =  Aj  if  y  <  tx 

=  x2  if  *  >  h  > 


(5.4) 


and  it  leads  to  the  life  distribution 


“V 

g(y)  =  Aje  if  y  <  tj 

“xitrx2(y_ti)  . 


=  A2e 


if  t.  <  y  <  ». 


(5.5) 


Except  for  a  change  of  notation,  it  is  identical  with  the  model  (5.3)  of  a 
tampered  exponential  random  variable.  However,  in  the  context  of  a  change  point 
hazard,  the  time  point  of  change  t1  is  regarded  as  an  unknown  parameter  in 


addition  to  the  failure  rates  A  and  A  .  Here,  the  standard  asymptotic  theory 
of  MLE  does  not  apply.  In  fact,  one  faces  the  problem  of  non-existence  of  the  MLE 
Nguyen  et  al  (1984)  and  Matthews  and  Farewell  (1982)  discuss  parameter  estimation, 
and  testing  the  hypothesis  of  no  change,  and  also  provide  references  to  earlier 
works  in  this  area. 


6.  DESIGNING  AN  ALT.  A  carefully  planned  life  test  experiment  is  at  the 
heart  of  success  in  gathering  informative  data,  coping  with  the  constraints  of 
cost  and  time,  and  arriving  at  effective  inferences  as  well  as  identifying 
directions  of  further  investigation.  Among  many  issues  involved  in  planning  an 
ALT  experiment,  some  are  to  be  resolved  from  an  understanding  of  the  physics  of 
failure.  These  include  choice  of  the  stress  variable(s),  choice  of  the 
acceleration  function  consistent  with  a  physical  model  of  the  failure  process,  and 
decision  regarding  the  range  of  stress  acceleration  which  would  be  feasible  and 
dependable  for  the  purposes  of  extrapolation.  Moreover,  accepted  engineering 
practice  in  a  given  context  should  guide  to  the  choice  between  a  constant  stress 
ALT  and  a  step-stress  ALT  experiment. 


Consider  the  most  common  type  of  ALT  where  a  single  stress  x  is 
accelerated,  and  denote  by  x^  and  x^  the  intended  lowest  and  highest  settings 

of  x.  As  before,  we  denote  the  use  condition  stress  by  xQ  so  x0  <  x|_  <  xh* 

With  a  constant  stress  ALT,  one  needs  to  determine  the  number  k  of  stress 
settings  to  be  used,  their  locations  in  the  interval  [x^x^],  the  allocation  of 

a  given  total  number  N  of  units  to  the  various  stress  settings,  the  period  of 
observation  and  the  scheme  of  censoring.  Unlike  the  situation  of  normal  theory 
regression  analysis  or  least  squares  fitting  of  multiple  regression  with  complete 
data,  a  statistical  treatment  of  optimal  ALT  plans  is  made  complicated  by  the  fact 
that  the  important  parametric  life  distribution  models  do  not  lead  to  exact 
results  for  the  sampling  distribution  of  the  relevant  estimators  or  manageable 
experssions  for  their  variances  especially  in  the  case  of  type  I  censored  data. 
Faced  with  this  pervasive  difficulty,  one  reasonable  approach  to  address  the  issue 
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of  optimal  test  plans  is  based  on  large  sample  theory  of  ML  estimators.  Nelson 
and  Kielpinski  (1975,  1976)  and  Nelson  and  Meeker  (1978)  discuss  several  test 
plans  along  this  line.  Their  main  developments  are  outlined  below. 

The  specifications  involved  in  their  development  of  optimal  test  plans 
include:  a  parametric  life  distribution  such  that  the  log-life  conforms  to  a 
location-scale  family,  an  engineering  acceleration  function  that  conforms  to  the 
log-linear  model  (such  as  the  Arrhenius  or  inverse  power  law),  a  total  sample  size 
N  and  a  common  censoring  time  T  (determined  from  cost  and  schedule  constraints) , 
and  the  highest  stress  setting  x^  (to  be  set  as  high  as  possible  subject  to  validity 

of  the  model).  The  object  of  inference  is  to  estimate  y(xQ),  the  median  log-life 

or  more  generally,  c  (xQ),  the  100P  percentile  of  the  life  distribution  at  the 

use  condition  stress  x  .  Two  kinds  of  test  plans,  the  best  standard  plans  and 
the  optimal  two-point  pi 8ns  are  discussed  in  this  setting. 

A  standard  plan,  so  called  because  of  its  popularity  among  practitioners, 
is  one  that  uses  k‘  equispaced  stresses  in  a  suitable  transformed  scale, 
and  equal  number  of  test  units  at  each  stress.  Given  k,  the  best  standard 
plan  seeks  to  determine  the  x^  that  minimizes  the  asymptotic  variance  of 

y(x  ),  the  MLE  of  n(xQ).  An  optimal  two-point  plan  uses  k  =  2  and  finds  the 

and  the  proportion  of  units  tested  at  x^  so  as  to  minimize  the 

asymptotic  variance  of  y(xQ).  To  arrive  at  these  plans  for  the  lognormal  life 

model.  Nelson  and  Kielpinski  (1976)  start  with  the  asymptotic  theory  of  MLE, 
compute  the  Fisher  information  matrix,  and  use  the  delta  method  to  deduce  an 

A 

expression  for  the  asymptotic  variance  of  y(x0).  Minimization  of  this  function 

is  done  numerically  on  a  computer  with  various  input  values  of  the  model 
parameters  and  other  quantities  that  are  fixed  in  advance,  and  thereby  charts  are 
prepared  for  guidance  to  the  practitioner.  Nelson  and  Meeker  (1978)  discuss  such 
plans  for  the  case  of  We i b u 11  distribution  along  with  the  inverse  power  law 
acceleration.  It  is  found  that  for  the  case  of  two-point  designs,  the  optimal 
plan  typically  allocates  more  units  to  the  low  stress  and  requires  a  slightly 
lower  x.  than  the  best  standard  plan.  Similar  issues  are  also  discussed  by 
Meeker  and  Hahn  (1977)  in  the  context  of  success-failure  data  and  a  logistic 
regression  model . 

It  is  to  be  noted  that  a  determination  of  these  optimal  plans  depends  on 
the  unknown  model  parameters  which  appear  in  the  expression  for  the  asymptotic 
variance  of  MLE.  Therefore,  one  must  have  an  informed  guess  of  the  parameter 
values  either  from  experience  with  similar  experiments  or  by  conducting  a 
preliminary  ALT  experiment.  Also,  a  drawback  of  the  two-point  plans  is  that  their 
optimality  rests  on  the  correct  choice  of  the  model  and,  at  the  same  time,  they 
provide  little  scope  of  checking  lack-of-fit  or  violation  of  the  model 
assumptions.  To  remedy  this  drawback  without  departing  too  much  from  optimality, 
best  compromise  plans  are  suggested.-  A  compromise  plan  uses  a  third  design  point 
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x  intermediate  between  x,  and  xu,  a  small  proportion  of  units  tested  at 
M  L  n 

XM,  and  retains  the  same  relative  allocation  to  xL  and  xH  as  with  the  optimal 
plan. 

Meeker  (1984)  reports  an  extensive  simulation  study  for  the  purpose  of 
comparing  the  above  plans  along  with  a  few  others  determined  from  such 
requirements  as  equal  expected  number  of  failures  rather  than  equal  sample  sizes 
at  the  design  points  and  minimization  of  the  variance  of  some  other  parameter 
estimates.  The  principal  criteria  used  in  this  comparison  include:  quality  of 
estimation  under  the  chosen  model  (precision),  ability  to  detect  a  departure  from 
the  assumed  linear  model  (goodness-of-fit) ,  sensitivity  to  misspecified  parameter 
values  (robustness)  and  ability  to  generate  adequate  failure  data  at  the  design 
points  (feasibility).  It  is  found  that  the  ALT  plans  that  are  theoretically 
optimal  have  serious  drawbacks  in  regard  to  the  other  criteria.  The  compromise 
plans  are  sub-optimal  but  are  more  robust  and  are  also  capable  of  detecting 
departures  from  the. assumed  model. 

The  above  discussion  summarizes  the  recent  developments  on  ALT  designs  for 
the  case  of  type  I  censoring  scheme  and  parametric  log  linear  analysis.  Earlier 
works  were  confined  largely  to  uncensored  data  under  the  exponential  model  with 
some  specific  acceleration  function  (Chernoff  1962)  or  the  standard  least  squares 
fitting  of  multiple  regression  (Herzberg  and  Cox  1972).  For  the  Weibull 
distribution  with  a  polynomial  function  for  the  log-scale  parameter,  Mann  (1972) 
discusses  optimal  test  plans  for  estimating  ?p(xQ)  ^  means  of  a  linear  function 

of  order  statistics  rather  than  the  MLE.  Fries  and  Bhattacharyya  (1985)  study 
optimal  ALT  designs  under  the  inverse  Gaussian  distribution  along  with  a 
reciprocal-polynomial  regression  model. 

Derringer  (1982)  points  out  that  in  order  to  observe  failures  with  a  single 
accelerated  stress,  one  often  requires  the  settings  so  large  that  validity  of  the 
assumed  model  becomes  questionable.  To  remedy  the  danger  of  a  long-range 
extrapolation,  he  suggests  the  use  of  multiple  stress  acceleration  so  each  stress 
factor  could  be  employed  at  relatively  low  levels  and  yet  together . they  would 
accomplish  the  purpose  of  a  single  large  stress.  This  is  also  logical  from  a 
practical  viewpoint  because  most  materials  or  systems  are  affected  by  several 
stresses  in  their  normal  operation.  However,  with  multiple  stress  acceleration 
one  needs  to  be  concerned  about  possible  interaction  of  the  stresses.  At  the  same 
time,  theoretical  modeling  of  the  acceleration  function  is  typically  more 
difficult  when  several  stresses  are  to  be  accelerated  simultaneously.  In 
essence,  the  choice  will  really  be  between  using  a  less  reliable  model  for  a 
short-range  extrapolation  and  a  more  reliable  model  for  a  long-range 
extrapolation.  For  an  effective  resolution  of  such  issues  there  ought  to  be 
sufficient  interaction  of  the  statistician  with  materials  scientists  and  engineers 
who  are  knowledgeable  about  the  mechanics  of  the  failure  process. 
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